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Abstract 


The  US  is  heavily  involved  in  the  first  major  war  of  the  21st  Century  -  The  Global 
War  on  Terrorism  (GWOT).  As  with  any  militant  group,  the  foundation  of  the  enemy’s 
force  is  their  people.  There  are  two  primary  strategies  for  defeating  the  terrorists  and 
achieving  victory  in  the  GWOT.  First,  we  must  root  out  terrorists  where  they  live,  train, 
plan,  and  recruit  and  attack  them  militarily.  Second,  we  must  suffocate  them  by  cutting 
off  the  supply  of  new  soldiers  willing  to  choose  aggression  or  even  death  over  their 
current  life.  This  thesis  helps  to  achieve  these  objectives  by  applying  Multivariate 
Analysis  techniques  to  identify  the  states  most  likely  to  provide  asylum  for  terrorists. 

Weak  and  Failed  States  are  attractive  to  terrorist  groups  looking  for  safe  haven 
and  recruits.  Governments  in  these  states  are  often  unable  to  prevent  illegal  activity,  and 
are  vulnerable  to  corruption  or  takeover.  Citizens  of  failing  states  often  experience 
poverty,  disease,  and  unemployment,  and  may  see  little  hope  for  improvement.  Terrorists 
can  meet  these  disenfranchised  people’s  basic  needs  and  promise  brighter  futures  for 
families  of  those  willing  to  fight  and  perhaps  die  for  the  cause. 

Current  published  efforts  to  identify  failing  states  primarily  use  Ordinary  Least 
Squares  Regression,  which  requires  the  analyst  to  predefine  the  degree  to  which  a  state  is 
likely  to  fail.  This  thesis  uses  a  Factor  Analysis  approach  to  identify  the  key  indicators  of 
state  failure,  and  Discriminant  Analysis  to  classify  states  as  Stable,  Borderline,  or  Failing 
based  on  these  indicators.  Furthermore,  each  nation’s  discriminant  function  scores  are 
used  to  determine  their  degree  of  instability.  The  methodology  is  applied  to  200 
countries  for  which  open  source  data  was  available  on  167  variables.  Results  of  the 
classification  are  compared  with  subject  matter  experts  in  the  field  of  state  failure. 
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CLASSIFYING  FAILING  STATES 


1.  Introduction 

1.1.  Background 

The  objective  of  this  thesis  is  to  assist  in  preventing  unstable  nations  from 
collapsing  or  erupting  into  violent  conflict.  This  is  accomplished  by  providing  a 
mathematical  model  to  classify  states  as  weak  or  failing  based  on  currently  collected  and 
available  data.  This  has  the  potential  to  allow  the  necessary  lead  time  for  the 
international  community  to  be  able  to  take  actions  to  avert  a  crisis  and  develop  a  stable 
infrastructure  necessary  to  sustain  lasting  peace.  There  are  two  key  premises  which 
underlie  the  importance  of  such  an  effort. 

First,  when  compared  to  conventional  warfare  and  post-conflict  reconstruction,  it 
is  less  costly  in  terms  of  lives,  dollars,  time,  public  support,  and  foreign  relations  to  take 
actions  in  failing  states  prior  to  their  collapse  or  the  outbreak  of  violent  conflict.  In 
addition,  enhancing  the  capacity  for  nations  to  sustain  themselves  is  more  likely  to 
provide  long-term  peace  in  crisis-prone  countries.  (Garment  and  Schnabel,  2003:  1-2). 

As  shown  in  Figure  1.1,  the  Office  of  Force  Transformation  is  moving  the  Department  of 
Defense  (DoD)  toward  a  strategy  of  dissuading  and  deterring  violent  conflict,  rather  than 
merely  participating  in  it  when  it  becomes  inevitable. 
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Figure  1-1 :  DoD  Strategy  -  Deter  Forward 
(Office  of  Force  Transformation,  2003:  29) 


Legend 
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Specifically,  DoD  Directive  3000.05  dated  28  November  2005  outlines  responsibilities 

within  the  DoD  for  Stability,  Security,  Transition,  and  Reconstruction  (SSTR) 

Operations.  Section  4.2  defines  the  purpose  of  these  operations: 

“Stability  operations  are  conducted  to  help  establish  order  that  advances  US 
interests  and  values.  The  immediate  goal  often  is  to  provide  the  local  populace 
with  security,  restore  essential  services,  and  meet  humanitarian  needs.  The  long¬ 
term  goal  is  to  help  develop  indigenous  capacity  for  securing  essential  services,  a 
viable  market  economy,  rule  of  law,  democratic  institutions,  and  a  robust  civil 
society.”  (DoDD  3000.05,  2005:  2) 

Second,  US  objectives  in  the  Global  War  on  Terrorism  (GWOT)  further  illustrate 
the  importance  of  this  study.  In  a  White  House  publication  defining  our  national  strategy 
for  combating  terrorism,  President  George  W.  Bush  set  as  one  of  the  United  States’ 
primary  goals  that  of  denying  terrorists  the  sanctuary  they  require  to  carry  out  their  plans 
and  attacks  (Office  of  the  President  of  the  United  States,  2003).  To  do  so,  the 
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international  community  must  discern  which  nations  are  currently,  or  are  in  danger  of 
becoming,  failed  states.  Weak  states  are  attractive  to  terrorist  cells  looking  for  asylum 
and  recruits  (Forest,  2006:  17-18).  Takeyh  and  Gvosdev  describe  four  key  benefits  failed 
states  provide  to  terrorist  organizations.  First,  they  provide  territory  to  live,  hide  and 
train.  Second,  failed  states  often  lack  legitimate  law  enforcement  capabilities,  leaving 
terrorists  free  to  traffic  drugs  and  amass  funds  and  weapons.  Third,  as  unemployment 
and  poverty  are  common  in  failed  states,  they  provide  terrorist  organizations  access  to 
potential  recruits  looking  for  a  better  way  of  life.  Finally,  the  UN  or  other  foreign 
countries  are  less  likely  to  invade  sovereign  states,  even  if  they  are  in  crisis  (Takeyh  and 
Gvosdev,  2002:  98-101). 

As  resources  available  for  stabilization  efforts  are  always  limited,  the  President 
calls  for  a  plan  to  focus  allied  efforts  on  those  countries  most  in  need  of  international  aid 
(Office  of  the  President  of  the  United  States,  2003:  17).  The  goal  of  this  thesis  then  is  to 
assist  the  US  and  her  allies  in  allocating  their  preemptive  stabilization  efforts  by 
constructing  a  model  to  classify  states  in  terms  of  their  stability  in  order  to  provide  early 
warning  of  a  potentially  failing  state. 

1.2.  Problem  Statement 

Numerous  government  and  non-government  agencies  expend  considerable 
resources  on  collecting  data  on  states  throughout  the  world  (See  Chapter  2  for  a  review  of 
various  studies).  One  of  the  driving  forces  behind  their  efforts  is  a  desire  to  preserve 
human  rights  and  dignity,  and  to  spread  personal  and  political  freedom.  Another 
objective  which  has  taken  center  stage  for  most  nations  is  the  containment  of  global 
terrorism.  Should  terrorists  find  a  foothold  in  failing  states,  it  will  be  more  difficult  and 
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costly  to  suppress  their  threat  to  freedom  in  the  future.  To  illustrate  this  point,  consider 

that  the  US  allocated  approximately  $87  Billion  in  2006  attempting  to  stabilize  Iraq 

(Congressional  Budget  Office,  2006:  1),  while  the  House  Committee  on  International 

Relations  estimates  only  $52  Million  will  be  required  to  maintain  stability  and  security  in 

The  Congo  over  each  of  the  next  two  years  (S.  2125,  2006: 1).  Michael  Dziedzic 

summarized  the  benefits  of  being  proactive  rather  than  reactive  in  assisting  failing  states: 

Neglect  is  not  a  strategy.  It  is,  rather,  a  guarantee  that  the  price  of  intervention 
will  inevitably  become  exhaustive.  The  better  alternative  is  to  become  proficient 
at  transforming  internal  conflict.  (Covey  et  al,  2005:  281) 

To  successfully  preserve  human  rights  and  thwart  terrorism,  appropriate  decisions 
must  be  made  regarding  the  allocation  of  precious  stabilization  and  conflict  prevention 
resources  leading  to  greater  cost  savings  in  the  long  run.  This  thesis  contends  that  such 
decisions  can  be  aided  by  a  rigorous  application  of  Operations  Research  (OR)  techniques 
to  currently  collected  and  available  data. 

Several  models  exist  for  predicting  failing  states.  Often,  however,  limited 
justification  is  provided  to  support  the  choice  of  model,  or  the  data  used  in  making 
predictions.  In  this  thesis,  multivariate  statistics  techniques  are  employed  to  help  crisis 
analysts  select  the  appropriate  data  to  collect,  and  to  help  identify  failing  states. 

1.3.  Approach  and  Methodology 

This  thesis  proposes  Factor  Analysis  (FA)  and  Discriminant  Analysis  (DA) 
approaches  to  identify  the  variables  most  significant  in  providing  early  warning  of  failing 
states.  Regression  methods  often  used  in  the  literature  rely  on  predetermined  state  crisis 
scores  to  serve  as  dependent  variables  (Rowlands  and  Joseph  in  Garment,  2003;  Poe  et  al, 
2006),  which  assumes  the  analyst  can  accurately  define  and  determine  the  level  of  crisis 
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in  a  country  a  priori,  independent  of  the  variables  later  used  to  build  the  regression 
model.  In  contrast,  FA  regards  observable  variables  as  reflective  indicators  of 
underlying,  unobservable  factors  (Dillon  and  Goldstein,  1984).  When  primary  factors 
emerge,  variables  reflecting  these  factors  can  be  identified.  DA  can  be  used  to  build  a 
discriminant  function  if  we  have  an  agreed  upon  list  of  currently  failing  or  weak  states, 
without  necessitating  a  quantification  of  such  a  listing.  Merely  identifying  states  as  weak 
will  allow  for  an  exploration  into  the  observable  data  available  for  such  states  and  allow 
us  to  quantify  why  it  might  be  failing. 

Once  the  key  variables  have  been  identified,  DA  can  further  be  used  to  construct 
an  appropriate  classification  model.  DA  uses  multiple  independent  variables  to  divide  a 
set  of  observations,  for  example  states,  into  two  or  more  categories  such  as  failing  or 
stable.  The  key  variables  determined  using  the  FA  and  DA  techniques  will  serve  as  the 
set  of  independent  variables,  and  DA  will  differentiate  between  countries  in  such  a  way 
as  to  provide  the  least  variation  within  groups,  based  on  the  information  contained  in 
those  variables. 

1.4.  Research  Scope 

The  focus  of  this  thesis  is  on  determining  what  factors  are  significant  predictors  of 
failing  states,  and  constructing  a  model  or  models  to  classify  nations  based  on  these 
factors.  Our  first  hypothesis  is  that  there  are  measurable,  statistically  significant 
differences  between  states  currently  defined  as  Stable,  Borderline,  and  Failing.  Our 
second  hypothesis  is  that  as  few  as  ten  variables  currently  being  collected  and  available 
through  open  source  can,  in  general,  be  used  to  accurately  classify  nations  as  weak  or 
failing.  Furthermore,  Discriminant  Analysis,  particularly  when  coupled  with  Factor 
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Analysis,  is  an  effective  tool  for  classifying  states  based  on  those  ten  key  variables. 
Currently  existing  models  are  evaluated  and  discussed.  While  this  study  serves  to  help 
focus  international  data  collection  and  stabilization  efforts,  it  does  not  recommend 
courses  of  action  for  the  US  in  any  specific  failing  nation,  nor  does  it  provide  methods  for 
gamering  Congressional  or  other  support  for  implementing  preemptive  measures.  Figure 
1-2  presents  the  basic  progression  of  crisis  intervention  efforts,  with  the  portion 
addressed  by  this  study  highlighted. 


Figure  1-2:  Research  Scope  -  Identify  Variables  and  Critical  States 
If  violent  conflict  were  to  be  thought  of  as  a  raging  fire,  then  this  thesis  purports 
to  describe  the  chemistry  of  the  fire’s  fuel.  It  does  not,  however,  examine  the  sparks 
which  ignite  the  fuel.  Often,  as  a  state  fails  and  the  fuel  of  instability  becomes  more  and 
more  explosive,  it  can  remain  dormant  for  long  periods  of  time  in  the  absence  of  a 
triggering  event,  or  spark.  These  triggers  can  come  from  the  government,  the  people,  or 
from  outside  forces  (Brown,  2001:  15-17).  This  thesis  does  not  attempt  to  characterize  or 
predict  the  single  events  that  ignite  conflict,  but  rather  it  describes  the  conditions  in 
which  a  spark  is  most  likely  to  result  in  state  failure. 

Finally,  for  the  purpose  of  maximizing  usability,  the  data  in  this  study  is  limited  to 
open-source.  Often,  data  availability  is  overused  in  determining  which  variables  are 
critical  for  analysis  (Bredel,  2003:  1 19).  It  would  be  an  overstatement  to  conclude  that 
the  variables  identified  in  this  study  are  the  only  data  to  consider  when  predicting  failing 
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states.  However,  if  certain  conditions  reflected  in  the  data  could  indeed  be  sufficient  to 
declare  a  nation  to  be  in  or  approaching  crisis,  the  analysis  can  be  considered  adequate  if 
not  entirely  comprehensive.  Furthennore,  the  methodologies  proposed  could  readily  be 
extended  to  any  data  available  to  the  analyst. 

1.5.  Assumptions 

Several  key  assumptions  are  necessary  for  the  proposed  methodology  in  this 
thesis  to  be  relevant.  First,  data  used  to  construct  the  model  was  available  in  open-source 
format  and  the  assumption  is  made  that  similar  data  will  continue  to  be  collected  and 
made  available.  The  usefulness  of  the  proposed  model  is  reliant  on  the  ability  of  analysts 
to  effectively  acquire  the  necessary  data  at  limited  additional  cost.  An  underlying 
secondary  assumption  is  that  analysts  will  use  this  model  to  assist  decision  makers  in 
identifying  where  the  US  should  focus  its  attention  and  perhaps  perfonn  to  a  more 
intensive  data  collection.  Once  key  states  of  interest  have  been  identified,  a  detailed 
situational  analysis  may  be  in  order  before  attempting  to  justify  the  additional  expense  of 
military,  economic  or  diplomatic  assistance  or,  ultimately,  intervention. 

Whenever  possible,  data  was  collected  from  a  single  source.  However,  when  it 
was  necessary  to  draw  from  multiple  sources,  it  was  assumed  that  each  source  used 
equivalent  collection  and  reporting  methods  unless  otherwise  specified.  Any  violations 
of  this  assumption  have  been  noted. 

1.6.  Overview 

Chapter  2  of  this  thesis  provides  a  review  of  relevant  literature,  including  a 
discussion  of  currently  existing  models  used  to  assist  in  the  early  warning  of  failing 
states.  For  each  model,  the  predictor  variables  used  by  that  model  are  identified.  A  brief 
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overview  of  various  OR  techniques  is  also  provided  in  Chapter  2.  More  detailed 
descriptions  of  Factor  Analysis  and  Discriminant  Analysis,  and  how  each  was  applied  to 
construct  the  early  warning  model,  are  outlined  in  Chapter  3.  In  Chapter  4,  several 
models  are  provided  and  applied  to  each  of  the  200  countries  in  the  dataset.  The  results 
are  compared  and  contrasted  with  the  work  of  subject  matter  experts  in  the  field  of  state 
failure.  Chapter  5  concludes  this  study  with  a  discussion  of  the  relevance  of  this  thesis, 
significant  insights  gained  and  recommendations  for  future  research. 
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2.  Literature  Review 


2.1.  Introduction 

This  chapter  begins  with  a  review  of  the  pertinent  literature  dealing  with  conflict 
prevention  through  the  use  of  predictive  modeling  of  failing  states.  Included  is  a 
discussion  of  the  merits  of  taking  preemptive  action  to  include  relevant  guidance  from 
US  foreign  policy. 

Various  authors  and  organizations  have  used  mathematical  modeling  to  attempt  to 
predict  failing  states;  the  most  common  approach  has  been  Ordinary  Least  Squares 
Regression  (OLS).  Therefore,  a  review  of  OLS  and  its  underlying  assumptions  is 
provided  next,  as  well  as  current  applications  to  conflict  prevention.  Other,  less  common, 
techniques  are  also  discussed. 

Following  a  review  of  current  approaches  to  conflict  prevention,  the  Operations 
Research  techniques  proposed  in  this  thesis  are  introduced.  A  case  is  made  for 
employing  Factor  Analysis  and  Discriminant  Analysis  to  predict  failing  states,  and  both 
techniques  are  explained. 

2.2.  Crisis  Prevention 

One  can  intuit  that  violence,  death,  poverty,  disease,  extreme  violations  of  human 
rights  and  other  such  occurrences  are  less  desirable  than  peace,  health,  personal  freedom, 
security,  and  life.  It  is  also  clear  that  people  in  various  regions  of  the  world  experience 
varying  levels  of  each  of  the  aforementioned  conditions  at  different  times.  What  is  not  so 
obvious  is  what  causes  a  nation  to  reach  “Crisis”  or  “Failing”  level  and,  for  that  matter, 
exactly  what  constitutes  a  nation  in  crisis.  It  is  these  questions  that  this  thesis  and  the 
literature  reviewed  here  attempt  to  address. 
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2.2.1.  Terms  and  Definitions 


Throughout  the  literature,  social  economists  and  political  scientists  often  use 
several  terms  and  phrases  interchangeably  to  describe  key  concepts.  What  may  be 
considered  conflict  or  crisis  under  one  definition  may  not  under  another.  This  section 
defines  some  of  the  key  terms  as  they  are  used  in  this  thesis. 

Accelerator  or  Trigger.  A  significant  event  or  change  in  a  key  factor  which  could 
cause  an  unstable  state  to  fail  or  fall  into  crisis  (Schmid,  1998:7).  Triggers  can  be 
absorbed  in  most  cases  by  stable  states  with  little  or  no  catastrophic  effects.  Examples 
include  the  2000  US  Presidential  election  or  illegal  immigration  from  Mexico  into  the 
US,  which  have  not,  to  date,  led  to  national  crisis  by  any  of  the  accepted  definitions. 
Jordan’s  expulsion  of  Palestinians  in  1970  however,  may  be  considered  a  trigger  for  the 
Lebanese  Civil  War  of  1975  (Brown,  2001:  16). 

Aggression.  Aggression  is  simply  the  application  of  armed  force  (Schmid, 
1998:8).  This  refers  to  force  applied  by  national  military  forces,  non-state  groups  within 
a  nation,  or  transnational  groups  when  applied  to  other  nations,  groups,  or  civilian 
population.  The  term  aggression  is  usually  not  used  to  describe  force  applied  by  third- 
party  peacemaking  organizations,  such  as  the  UN. 

Armed  Conflict.  When  aggression  is  applied  between  two  groups,  both  of  which 
possess  weapons  of  war,  it  is  called  Armed  Conflict  (Schmid,  1998:8). 

CNN-Factor.  The  CNN-Factor  refers  to  the  emotional  reaction  of  the  public  to 
media  coverage  of  events  or  conditions.  Debate  continues  as  to  how  a  public’s  reaction 
to  what  they  see  on  TV  can  influence  their  government’s  response  to  a  crisis  in  another 
country  (Schmid,  1998: 1 1).  A  second  CNN-Factor  concerns  people  within  a  country 


2-2 


involved  in  conflict.  Among  others,  Barnett  (2005)  and  Brown  (2001)  claim  that  as 
people  in  globally  disconnected  countries  become  aware  of  their  own  conditions  as 
compared  to  the  rest  of  the  country,  region  or  world,  conflict  may  arise  from  their 
perception  of  imbalance  in  standard-of-living. 

Conflict  Prevention.  Proactive  strategy  to  identify  necessary  conditions  for 
stability,  and  take  actions  to  create  those  conditions  (Garment  and  Schnabel  in  Cannent, 
2003:  1 1).  Note  that  by  this  definition,  conflict  is  more  synonymous  with  instability,  not 
necessarily  violence;  with  the  conditions  which  may  lead  to  violence,  not  the  resulting 
violence  per  se. 

Failing  or  Failed  State.  One  of  the  primary  objectives  of  this  thesis  is  to  refine 
the  definition  of  a  Failing  or  Weak  State  by  identifying  the  key  variables  used  to  measure 
national  stability.  In  broad  terms,  governments  exist  to  provide  the  people  within  a 
defined  region  a  wide  spectrum  of  public  or  political  goods  (Forest,  2006: 18-19).  Use  of 
the  terms  Failing,  Weak,  or  Collapsed  States  earlier  in  this  thesis  refers  to  a  state  in  the 
unquantifiable  condition  of  being  unable  to  successfully  provide  these  services  to  its 
citizens.  Such  states  may  require  outside  intervention  to  avoid  violent  conflict  and 
significant  terrorist  infiltration. 

Genocide.  Genocide  includes  any  attempt  to  destroy  a  group  of  people  for 
religious,  ethnic,  national,  or  racial  reasons  (Schmid,  1998:15).  Actions  may  include,  but 
are  not  limited  to,  armed  conflict,  mass  murder,  prevention  of  birth,  or  child  abduction. 

Peace.  Peace  is  typically  defined  as  the  absence  of  conflict,  instability, 
repression,  and  poverty  or  the  process  of  working  to  achieve  such  a  condition  (Schmid, 
1998:19). 
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Peacekeeping.  Peacekeeping  refers  to  efforts  taken  usually  in  the  presence  of 

latent,  increasing  conflict  to  prevent  the  conflict  from  escalating  into  violence  (Bredel, 

2003:  9-11).  Peacekeeping  often  involves  direct  third-party  involvement  in  resolving 

only  the  conflict  itself,  rather  than  considering  the  underlying  conditions  necessary  for 

long-term  peace.  Peacekeeping  ends  when  the  immediate  threat  of  violence  has  waned. 

Peacemaking.  Peacemaking  refers  to  efforts  taken  during  conflict  to  restore  peace 

(Bredel,  2003:  9-11).  This  is  the  portion  of  the  peace  process  most  commonly  associated 

with  third-party  military  intervention.  However,  this  pattern  is  not  ideal  as  explained  by 

UN  Secretary  Kofi  Annan  in  his  1998  address  in  response  to  the  Carnegie  Commission 

final  report  on  preventing  deadly  conflict: 

“...we  seem  never  to  leam.  Time  and  again  differences  are  allowed  to  develop  into 
disputes  and  disputes  allowed  to  develop  into  deadly  conflicts.  Time  and  again, 
warning  signs  are  ignored  and  pleas  for  help  overlooked.  Only  after  the  deaths  and 
the  destruction  do  we  intervene  at  a  far  higher  human  and  material  cost  and  with  far 
fewer  lives  to  save.  Only  when  it  is  too  late  do  we  value  prevention.”  (SG/SM/6454, 
1998) 

This  sentiment  underscores  the  importance  of  identifying  and  providing  early  warning  of 
failing  states. 

Peace-building.  Peace-building  refers  to  efforts  taken  in  the  absence  of  conflict, 
either  before  or  after,  to  prevent  future  conflict  (Bredel,  2003:  9-11).  It  is  in  this  stage 
where  this  thesis  proposes  to  provide  the  most  benefit,  with  the  goal  of  assisting  in 
identifying  which  states,  and  which  factors,  require  the  most  attention  to  achieve  long¬ 
term  peace. 

Stability  Operations.  This  is  an  umbrella  term  encompassing  all  military  and 
civilian  activities  conducted  to  establish  or  maintain  order  (DoDD  3000.05,  2005:  2). 
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2.2.2.  Recent  and  Ongoing  Crisis  Prevention  Efforts 

Crisis  prevention  is  a  key  objective  for  the  US  as  well  as  the  global  community. 
DoD  directive  3000.05  states  that  stability  operations  are  critical  to  US  interests  and 
values.  One  of  our  most  urgent  interests  is  to  prevent  the  spread  of  terrorism  and  to  root 
out  terrorist  organizations  where  they  live,  work  and  train.  Failed  states,  or  nations  in 
crisis,  provide  ideal  conditions  for  terrorists  to  carry  out  their  missions  (Takeyh  and 
Gvosdev,  2002:  98-101,  Forest,  2006:17-18).  To  that  end,  the  DoD  has  outlined  roles 
and  responsibilities  for  crisis  prevention  across  the  entire  spectrum  of  agencies  to  include 
foreign  and  international  governments,  as  well  as  non-government  (NGO)  and  private 
organizations  (DoDD  3000.05,  2005:  3). 

Dr.  Thomas  P.M.  Barnett,  who  has  been  working  in  national  security  affairs  for  a 
number  of  years,  has  received  significant  attention  for  his  work  in  identifying  and 
predicting  the  nations  most  likely  to  cause  concern  for  the  United  States,  to  include 
failing  or  failed  states.  He  has  published  several  books  and  a  number  of  articles  on  the 
subject  and  has  given  over  1,000  briefings  to  leaders  and  decision  makers  both  in  and  out 
of  the  DoD  and  throughout  the  world  (Barnett  in  Esquire,  2003).  His  books,  The 
Pentagon ’s  New  Map  (2004)  and  its  follow-up  text  Blueprint  for  Action  (2005),  provide  a 
breakdown  of  states  which  are  most  critical  from  a  US  strategic  standpoint.  Barnett 
claims  that  countries  which  are  left  out  of  the  global  economy,  or  “Gap  Countries”,  are 
most  in  danger  of  collapsing,  and  thereby  providing  safe  haven  for  terrorist  groups.  The 
basis  for  the  conclusion  that  these  Gap  countries  are  most  likely  to  be  areas  of  concern  for 
the  US  echoes  the  9/11  Commission  Report: 
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Economic  openness  is  essential.  Terrorism  is  not  caused  by  poverty.  Indeed,  many 
terrorists  come  from  relatively  well-off  families.  Yet  when  people  lose  hope, 
when  societies  break  down,  when  countries  fragment,  the  breeding  grounds  for 
terrorism  are  created.  Backward  economic  policies  and  repressive  political 
regimes  slip  into  societies  that  are  without  hope,  where  ambition  and  passions 
have  no  constructive  outlet.  (9/1 1  Commission  Report,  2004:  395) 

The  popularity  of  Barnett’s  work,  and  its  parallels  with  the  9/1 1  Commission, 

underscore  the  importance  of  this  study.  This  thesis  complements  these  efforts  by 

providing  the  added  benefit  of  objective,  reproducible  measures  of  states’  likelihood  of 

failure.  Barnett’s  classification  serves  as  one  of  the  bases  for  the  Discriminant  Analysis 

used  in  this  thesis;  a  comparison  between  his  classification  and  the  DA  results  can  be 

found  in  Chapter  4. 

On  the  global  scale,  in  2000  the  United  Nations  adopted  a  resolution  agreeing  to 
work  toward  eight  common  goals  directed  at  reducing  poverty,  disease  and  violence; 
increasing  education,  women’s  rights  and  tolerance;  and  protecting  human  rights,  peace, 
and  the  environment  (UN  Resolution  55/2,  2000).  This  resolution  led  to  the  development 
of  the  Millennium  Development  Goals  (MDG)  Indicators;  a  list  of  48  measures  used  to 
gauge  the  world’s  progress  toward  achieving  the  MDG.  The  UN  Common  Database, 
which  contains  the  MDG  data,  can  be  found  at  http://unstats.un.org/unsd/default.htm. 

The  MDG  indicators  are  listed  in  Appendix  A.  Here  too  there  is  great  value  added, 
specifically  by  the  collection  of  observable  data  on  all  countries.  This  thesis  builds  on  the 
UN  data  collection  efforts  by  identifying  which  variables  are  statistically  most  important, 
and  using  those  variables  to  make  predictions  as  to  where  the  next  crisis  may  arise. 

There  are  a  number  of  studies  in  the  literature  attempting  to  predict  failing  states 
and  prevent  conflict.  The  published  work  in  this  area  has  been  primarily  done  by 
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economists  and  political  scientists.  Some,  such  as  Barnett,  apply  “soft  science” 
techniques  based  on  the  idea  that  while  it  may  be  difficult  to  define  or  quantify  crisis,  we 
know  it  when  we  see  it.  Others  have  employed  statistical  techniques,  most  commonly 
Ordinary  Least  Squares  Linear  Regression.  The  following  sections  provide  an  overview 
of  regression  analysis  and  how  it  has  been  used,  a  brief  commentary  on  other  statistical 
techniques  found  in  the  literature,  and  an  introduction  to  the  methods  used  in  this  study. 
2.3.  Commonly  Used  Operations  Research  Techniques  and  Applications 

2.3.1.  OLS  Regression 

Ordinary  Least  Squares  (OLS)  Linear  regression  is  currently  the  most  common 
method  used  in  the  literature  for  the  prediction  of  failing  states.  This  appears  to  be  so  for 
several  reasons.  First,  if  an  analyst  desires  to  predict  or  estimate  the  future  level  of  a 
response  or  dependent  variable  using  available  data,  e.g.  predicting  future  crisis  levels 
based  on  current  inflation  rates,  OLS  may  be  the  appropriate  tool  (Montgomery,  Peck, 
and  Vining,  2001 :  1 1).  In  addition,  when  used  appropriately,  regression  can  provide  a 
logical,  intuitive  model  which  illustrates  the  relationship  between  the  variable  of  interest 
and  each  of  the  predictor  variables  used  in  the  model.  Finally,  due  to  its  frequent  use, 
regression  is  widely  understood,  especially  within  the  economic  and  political  science 
fields  (Wonnacott  and  Wonnacott,  1979). 

However,  there  are  potential  drawbacks  to  using  OLS  regression  to  predict  failing 
states.  There  may  be  difficulty  defining  a  response  variable,  violations  of  the 
fundamental  assumptions  of  regression  -  in  particular  severe  multicollinearity  of 
regressors,  and  the  possibility  of  overstating  or  misinterpreting  results.  The  following 
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sections  examine  these  issues  in  turn,  and  provide  rationale  for  using  the  alternative 
methods  employed  in  this  study. 

2.3. 1.1.  Response  Variables 

OLS  Regression  analysis  is  used  to  model  the  relationship  between  one  or  more 
independent  variables,  also  called  predictors  or  regressors,  and  a  single  dependent 
variable,  also  known  as  the  response  variable.  For  example,  a  person  may  be  interested 
in  predicting  the  amount  of  time  it  will  take  him  to  get  to  work.  He  might  hypothesize 
that  a  regression  model  should  include  such  regressors  as  distance,  traffic  levels,  and 
weather.  Knowing  the  levels  of  these  three  variables  may  provide  a  decent  prediction  as 
to  the  amount  of  time  needed  to  drive  to  work.  This  simple  example  highlights  an 
important  concept.  The  cornerstone  of  any  regression  analysis  is  the  variable  of  interest  - 
the  response  variable.  In  this  example,  the  average  time  required  to  get  to  work  is  the 
response,  and  it  is  conveniently  objective,  measurable,  and  quantifiable.  However,  if  the 
variance  of  the  time  to  get  to  work  were  large  or  skewed  toward  the  right,  one  might  be 
late  too  often  to  keep  ones  job,  even  if  the  average  time  needed  model  was  significant. 
When  attempting  to  predict  failing  states,  we  may  not  even  have  the  ability  to  identify  a 
response  variable. 

There  are  as  many  response  variables  used  for  predicting  crises  as  there  are 
studies  done  on  the  subject.  Rowlands  and  Joseph  in  Cannenf  s  book  Conflict 
Prevention,  2003  rated  countries  with  an  integer  from  0  to  3  on  a  conflict  intensity  scale. 
A  rating  of  zero  was  given  to  countries  that  experienced  little  or  no  violence  in  a  given 
year,  while  a  rating  of  three  went  to  countries  involved  in  major  conflicts  such  as  civil 
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war  (Garment,  2003:  226-7).  Of  course,  the  aim  of  Rowlands  and  Joseph’s  study  was 
only  to  predict  conflict,  not  failing  states.  However,  it  is  a  contention  of  this  study  that 
the  two  phenomena  are  intertwined.  For  the  purpose  of  predicting  unstable  and  failing 
states,  level  of  conflict  could  certainly  be  considered  important,  but  there  is  a  question  of 
whether  it  should  be  a  predictor  or  a  response  variable,  and  if  it  is  the  only  variable  that 
should  be  considered  analogous  with  crisis. 

Nafziger  and  Auvinen  in  a  1997  paper  went  a  step  further  and  proposed  a  single 
dependent  variable  which  was  the  result  of  a  direct  calculation  involving  four  other 
variables:  number  of  people  killed  in  battles,  infant  mortality  rate,  daily  calorie  supply 
per  capita,  and  number  of  refugees  and  displaced  persons  (Nafziger  and  Auvinen,  1997: 
14-17).  They  used  these  proxies  because  they  had  predefined  their  variable  of  interest  as 
a  Complex  Humanitarian  Emergency  (CHE)  score,  where  a  CHE  was  further  defined  as 
having  large  numbers  of  people  dying  or  suffering  from  war,  disease,  hunger  or 
displacement  (Nafziger  et  ah.  1).  Again  we  see  that  defining  the  dependent  variable  lays 
the  foundation  for  the  regression  analysis,  but  as  before  we  are  left  with  questions  as  to 
how  these  variables  were  chosen  and  assigned  to  be  predictors  or  responses. 

Poe,  Rost  and  Carey  in  2006  use  a  Political  Terror  Scale  (PTS)  derived  by 
Professor  Mark  Gibney  of  the  University  of  North  Carolina  as  an  indication  of  crisis 
manifested  in  the  occurrences  of  human  rights  violations  (Poe  et  al,  2006,  490-1).  Dr. 
Gibney  compiled  data  from  various  sources,  primarily  the  US  State  Department  and 
Amnesty  International.  He  and  his  team  assigned  an  integer  score  of  1  to  5  based  on  the 
following  criteria: 
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Level  1:  Countries  under  a  secure  rule  of  law,  people  are  not  imprisoned  for  their  view, 
and  torture  is  rare  or  exceptional.  Political  murders  are  extremely  rare. 

Level  2:  There  is  a  limited  amount  of  imprisonment  for  nonviolent  political  activity. 
However,  few  persons  are  affected,  torture  and  beatings  are  exceptional.  Political  murder 
is  rare. 

Level  3 :  There  is  extensive  political  imprisonment,  or  a  recent  history  of  such 
imprisonment.  Execution  or  other  political  murders  and  brutality  may  be  common. 
Unlimited  detention,  with  or  without  a  trial,  for  political  views  is  accepted. 

Level  4:  The  practices  of  level  3  are  expanded  to  larger  numbers.  Murders, 
disappearances,  and  torture  are  a  common  part  of  life.  In  spite  of  its  generality,  on  this 
level  terror  affects  those  who  interest  themselves  in  politics  or  ideas. 

Level  5 :  The  terrors  of  level  4  have  been  expanded  to  the  whole  population.  The  leaders 
of  these  societies  place  no  limits  on  the  means  or  thoroughness  with  which  they  pursue 
personal  or  ideological  goals. 

('http://www.unca.edu/politicalscience/images/Colloquium/faculty-staff/gibnev.html,  18 

October  2006) 

In  our  DA,  we  found  Dr.  Gibney’s  Political  Terror  scale  to  be  valuable  for  discriminating 
failing  states. 

Each  of  the  aforementioned  studies  attempts  to  explain  or  predict  various  crises 
by  concentrating  on  some  observable  variable(s).  But  we  would  still  need  to  determine 
what  truly  defines  a  national  crisis  or  humanitarian  emergency  or  failing  state  in  order  to 
use  OLS.  In  contrast,  rather  than  identifying  a  single  variable  as  our  crisis  level  indicator, 
this  thesis  proposes  exploratory  Factor  Analysis  to  detennine  a  set  of  key  variables  most 
useful  in  characterizing  the  overall  status  of  nations.  Discriminant  Analysis  can  then  be 
used  to  analyze  the  classification  of  states  on  the  basis  of  these  variables.  This  non¬ 
reliance  on  defining  and  calculating  a  dependent  variable  allows  us  to  explore  all 
available  data,  and  identify  weak  or  failing  states  based  on  their  similarities  to  countries 
whose  overall  status  is  known. 
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2.3. 1.2.  Assumptions 


A  second  pitfall  in  using  OLS  Regression  analysis  for  predicting  failing  states 
deals  with  the  assumptions  analysts  must  make  in  order  for  regression  analysis  to  provide 
reasonably  valid  results.  First,  the  relationships  between  the  predictors  and  response 
variable  are  assumed  to  be  at  least  approximately  linear.  Second,  the  error  terms,  which 
are  the  differences  between  each  of  the  observed  values  of  the  response  variable  and  the 
values  predicted  by  the  regression  model,  are  assumed  to  be  normally  distributed  with  a 
mean  of  zero  and  constant  variance.  Third,  the  error  terms  are  assumed  independent  and 
uncorrelated  (Montgomery  et  al,  2001 :  131). 

The  first  assumption  necessary  for  ordinary  least  squares  regression  (OLS)  to  be 
appropriate  is  that  there  is  at  least  an  approximately  linear  relationship  between  the 
independent  “x”  variables  and  the  dependent  “y”.  Often  a  simple  scatter  plot  of  the  data 
will  confirm  or  refute  the  claim  of  a  linear  relationship.  The  scatter  plots  shown  in  Figure 
2-1  demonstrate  the  dangers  of  violating  the  linearity  assumption. 


Figure  2-1 :  Situations  in  which  OLS  a)  is  and  b)  is  not  appropriate 
(Montgomery  et  al,  200 1 :  27) 
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In  both  cases,  an  identical  OLS  model  may  be  statistically  significant.  However, 
a  strictly  linear  model  will  probably  not  sufficiently  explain  the  relationship  between  x 
and y  in  Figure  2-1  b.  As  an  example  from  the  failing  states  literature,  Brown  claims  that 
the  more  repression  minorities  experience  in  a  nation,  the  more  likely  it  is  that  the  nation 
will  erupt  into  war  and  crisis  (Brown,  2001 :  29).  However,  Ian  Bremer  describes  a 
different  phenomenon  in  his  2006  book  “The  J-Curve”.  He  contends  that  nations  can  be 
effectively  stable  with  a  repressive  government  such  as  in  North  Korea,  or  a  free  and 
open  government  such  as  the  US,  and  that  the  truly  unstable  countries  are  those  in 
between.  As  nations  move  from  a  “stable,  closed”  state  in  which  minority  repression  is 
high  to  a  “stable,  open”  state,  they  will  traverse  a  very  unstable  period  (Bremer,  2006). 

In  other  words,  repression  and  national  stability  may  be  related  in  a  fashion  similar  to 
what  is  shown  in  Figure  2- lb  and  not  in  a  truly  linear  fashion. 

Similarly,  freedom  of  the  press  is  often  considered  a  sign  of  a  more  stable 
government,  and  therefore  a  more  stable  nation.  However,  Snyder  and  Ballentine  in 
Brown,  2001  claim  that  without  a  stable  govermnent  already  in  place,  a  free  press  may 
actually  cause  more  harm  than  good,  at  least  in  the  short  term  (Snyder  and  Ballentine  in 
Brown,  2001).  The  media  can  raise  people’s  awareness  of  the  inequalities  to  which  they 
had  previously  been  oblivious,  inciting  aggressive  action  in  some  cases.  This 
phenomenon  is  sometimes  known  as  the  CNN  factor.  An  example  is  television  coverage 
depicting  a  higher  standard  of  living  in  a  neighboring  nation  than  is  experienced  at  home. 

To  address  the  issue  of  non-linear  relationships  between  independent  and 
dependent  variables,  it  is  often  desirable  to  transform  one  or  more  of  the  variables 
(Montgomery  et  al,  2001:  27).  For  instance,  introducing  a  squared  term  in  the  situation 


2-12 


in  Figure  2-1  b  may  yield  a  linear  relationship.  Applications  of  such  transformations  are 
demonstrated  in  the  methodology  portion  of  this  thesis. 

The  second  assumption  in  OLS  is  that  the  residuals,  or  error  terms,  are  normally 
distributed  with  a  mean  of  zero  and  constant  variance.  When  the  variance  of  the  error 
tenns  is  constant  it  is  called  homoscedasticity.  If  the  variance  changes  for  different  levels 
of  x,  it  is  called  heteroscedasticity.  The  danger  posed  by  heteroscedasticity  is  that  the 
OLS  model  may  appear  more  significant  than  it  truly  is;  which  is  to  say  the  confidence 
level  of  inferences  made  using  the  model  may  be  artificially  high  (Lattin  et  al,  2003,  58). 
Consider  the  scatter  plot  in  Figure  2-2  which  shows  severely  heteroscedastic  data.  The 
OLS  regression  line  may  be  a  good  predictor  of  y  for  lower  levels  of  x,  but  as  x  increases, 
the  model’s  accuracy  decreases.  This  trend  may  not  be  apparent  if  residuals  are  not 
checked. 


Figure  2-2:  Heteroscedasticity 
(Lattin  et  al,  2003:  58) 


Heteroscedasticity  is  most  often,  but  not  exclusively,  encountered  when 
considering  changes  in  variable  levels  over  time  (Wonnacott,  1979:  194).  For  example, 
as  a  state  becomes  involved  in  conflict  of  increasing  intensity  or  seems  to  be  inching 
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towards  conflict,  not  only  the  number  but  the  variation  in  the  number  of  battle  related 
deaths  may  increase.  Transformations  of  variables  may  reduce  heteroscedasticity,  or  an 
analyst  may  employ  Weighted  Least  Squares  which  assigns  lower  weights  to 
observations  with  higher  variances.  This  has  the  effect  of  creating  a  model  whose 
parameters  are  based  to  a  greater  extent  on  data  for  which  the  predictions  are  believed  to 
be  more  accurate. 

The  third  underlying  assumption  in  OLS  is  that  the  error  terms  are  independent 
and  uncorrelated.  Ideally,  the  errors  will  be  random  and  knowing  the  error  at  one  level  of 
the  independent  variable  will  say  nothing  about  the  expected  error  at  the  next  level.  If  a 
pattern  exists  in  the  error  terms  however,  for  example  several  positive  error  values, 
followed  by  several  negative  values,  and  so  on,  autocorrelation  may  be  present.  This  can 
also  be  the  case  if  each  positive  error  tends  to  be  followed  by  a  negative  error  and  vice- 
versa.  As  with  heteroscedasticity,  autocorrelation  can  lead  to  overstating  the  confidence 
of  our  model  (Lattin  et  al,  2003:  61).  Again,  transformations  or  a  Weighted  Least 
Squares  approach  may  be  necessary. 

2.3. 1.3.  Multicollinearity 

Multicollinearity  occurs  when  one  or  more  of  the  independent  variables  are 
correlated.  It  can  also  occur  if  one  of  the  variables  is  close  to  a  linear  combination  of  one 
or  more  of  the  others.  With  behavioral  or  economic  data,  multicollinearity  is  always 
present;  it  is  a  question  of  degree.  An  example  of  where  an  analyst  might  encounter 
multicollinearity  would  be  building  a  model  using  average  caloric  intake  per  person,  and 
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percentage  of  the  population  suffering  from  malnourishment.  While  one  measures 
calories,  and  the  other  measures  a  percentage,  both  variables  are  closely  inter-related. 

When  multicollinearity  is  present,  the  variances  and  covariances  can  become 
quite  large  and  hence  the  confidence  level  we  have  in  our  model  decreases;  or 
conversely,  the  confidence  interval  around  our  model  grows  larger.  If  we  remove 
variables  which  are  collinear  with  other  variables  left  in  the  model,  the  model  parameters 
remain  unchanged,  but  the  confidence  intervals  become  tighter  (Wonnacott,  1979:  354). 
This  has  the  added  benefit  of  reducing  the  number  of  variables  required  to  make 
predictions. 

Removing  variables  from  our  model,  however,  does  have  the  drawback  of  losing 
whatever  additional  information  those  variables  contain,  minimal  though  it  may  be.  In 
Factor  Analysis,  it  is  not  necessary  to  remove  variables  from  the  model  purely  on  the 
basis  of  multicollinearity.  Instead,  we  may  reduce  the  dimensionality  of  the  data  into  its 
primary  factors  while  retaining  all  available  information. 

2.3. 1.4.  Interpretation  of  Results 

It  can  often  be  difficult  to  interpret  the  results  of  any  analysis  and  recommend 
courses  of  action  based  on  those  results.  OLS  Regression  is  no  exception.  Once  the 
appropriate  variables  are  selected  and  the  best  possible  model  is  created  from  these 
variables,  analysts  are  still  left  to  attempt  to  explain  what  the  model  tells  us,  and  of  course 
what  it  does  not. 

First,  we  must  be  careful  not  to  assign  causal  relationships  to  variables  in  the 
OLS  model.  Even  if  a  correlation  has  been  identified,  it  is  not  necessarily  true  that 


2-15 


changing  the  value  of  one  variable  in  the  real  world  will  directly  affect  the  other 
(Montgomery  et  al,  2001:  42).  Rowlands  and  Joseph  in  Garment,  2003  provide  an 
excellent  example  of  such  a  dilemma.  In  their  effort  to  explore  the  causes  of  internal 
conflict,  they  found  that  if  the  average  inflation  rate  of  a  country  increases,  so  does  that 
country’s  level  of  internal  conflict.  Conversely,  as  involvement  of  the  International 
Monetary  Fund  (IMF)  increases,  internal  conflict  tends  to  decline  (Rowlands  and  Joseph 
in  Garment,  2003:  217).  The  authors  caution  against  claiming  that  reducing  inflation 
would  necessarily  cause  a  reduction  in  the  level  of  internal  conflict,  and  remind  the 
reader  that  the  IMF  has  anti-inflationary  policies  as  a  condition  of  their  financial  support 
(Rowlands  and  Joseph  in  Garment,  2003:  217).  In  other  words,  the  reality  of  the  situation 
may  be  that  IMF  involvement  actually  causes  both  inflation  and  civil  conflict  to  decrease, 
and  that  simply  reducing  inflation  will  not  by  itself  lead  to  decreasing  conflict.  It  may 
also  be  true  that  none  of  the  aforementioned  variables  are  causally  related  to  each  other  at 
all.  A  third  possibility,  and  the  one  explored  further  in  this  study,  is  that  they  are  merely 
reflections  of  some  underlying  factor  not  fully  accounted  for  in  the  model. 

A  second  issue  that  arises,  even  with  a  perfectly  constructed  model,  is 
determining  and  reporting  how  well  the  model  actually  describes  the  data.  Essentially, 
we  need  a  measure  of  how  accurate  we  believe  the  model  to  be  as  a  predictor  of  the 
dependent  variable.  A  reader  must  be  careful  not  to  draw  conclusions  or  take  radical 
action  based  on  predictions  from  a  poorly  fit  model.  Often,  in  the  failing  state  literature, 
the  exclusive  measure  of  how  well  the  model  fits  the  data  is  the  coefficient  of 
determination,  also  called  R-square. 
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Simply  stated,  the  R-square  value  of  an  OLS  Regression  model  is  the  proportion 
of  variation  in  the  dependent  variable  which  can  be  explained  by  the  independent 
variables  included  in  the  model  (Montgomery  et  al,  2001:  39-40).  It  is  the  ratio  of  the 
amount  of  variation  explained  by  the  regressors  to  the  total  amount  of  variation  in  the 
dependent  variable  (Montgomery  et  al,  2001:  39). 

^2  _  SSR  _  Regression  Sum  of  Squares  _  ^  (jf  -  y)~ 

SST  T otal  Sum  of  Squares  ^  ( v,  -  y) 2 

Where 

y,  =  observed  value  of  the  ith  response 

v,  =  predicted  value  of  the  ith  response 
v  =  mean  of  responses 

R-square  values  range  from  0  to  1,  with  values  close  to  1  indicating  that  most  of 
the  variation  in  y  can  be  explained  by  x.  This  does  not  guarantee  that  the  model 
sufficiently  explains  the  relationship  between  y  and  x  however.  Referring  back  to  Section 
2.3. 1.2,  the  R-square  values  for  the  OLS  model  in  Figure  2-1  may  be  equal  and  close  to  1, 
but  the  model  is  clearly  more  representative  of  the  data  shown  in  Figure  2- la 
(Montgomery  et  al,  2001:  40). 

While  a  large  R-square  value  does  not  guarantee  that  our  model  is  sufficient,  a 
low  R-square  does  suggest  that  our  model  is  inadequate,  or  at  least  does  not  explain  a 
substantial  amount  of  the  variation  in  our  item  of  interest.  There  is  no  universally 
recognized  minimum  acceptable  value  for  R-square,  but  low  values  indicate  the  potential 
for  more  comprehensive  measures  to  provide  useful  insights.  During  this  literature 
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review,  we  encountered  no  R-square  values  greater  than  .20,  suggesting  there  is 
significant  room  for  improving  our  abilities  to  predict  failing  states. 

2.3.2.  Other  Techniques  from  the  Literature 

Regression  analysis  is  by  far  the  most  commonly  used  technique  for  predicting 
failing  states  found  in  the  literature.  However,  several  other  methods  have  been 
employed.  The  Fund  for  Peace  uses  a  Conflict  Assessment  System  Tool  (CAST)  to  scan 
published  articles  from  around  the  world  for  key  words  like  “repression”,  “war”,  and 
“famine”.  CAST  highlights  articles  based  on  the  frequency  of  these  indicator  words  or 
phrases,  and  those  articles  are  reviewed  by  subject  matter  experts  (Baker,  2005:  slide  9). 
The  experts  assign  a  final  score  for  each  country  in  each  of  twelve  categories:  Mounting 
Demographic  Pressures,  Massive  Movement  of  Refugees  and  IDPs,  Legacy  of 
Vengeance  -  Seeking  Group  Grievance,  Chronic  and  Sustained  Human  Flight,  Uneven 
Economic  Development  along  Group  Lines,  Sharp  and/or  Severe  Economic  Decline, 
Criminalization  or  Delegitimization  of  the  State,  Progressive  Deterioration  of  Public 
Services,  Widespread  Violation  of  Human  Rights,  Security  Apparatus  as  "State  within  a 
State",  Rise  of  Factionalized  Elites,  and  Intervention  of  Other  States  or  External  Actors 
(http://www.fundforpeace.org/programs/fsi/fsindex2006.php,  2006).  The  scores  are 
summed  and  the  nations  are  ranked  in  order  of  likelihood  of  failure.  The  Fund  for 
Peace’s  2006  State  Failure  Index  serves  as  a  second  a  priori  classification  for  this  thesis. 

Recall  that  when  the  data  being  used  do  not  appear  to  be  linearly  related  to  the 
response  variable,  variable  transformations  may  be  used.  For  example,  Nafziger  and 
Auvinen,  1997  found  that  taking  the  natural  logarithm  of  all  independent  variables 
resulted  in  a  stronger  linear  relationship  with  their  Complex  Humanitarian  Emergency 
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(CHE)  score  (Nafziger  and  Auvinen,  1997:  14-17).  Deciding  which  variables  to 
transform,  and  which  transformation  to  use,  is  no  simple  task.  Throughout  this  literature 
review,  we  found  only  logarithmic  transformations  to  have  been  used  to  date. 

2.4.  Analysis  Techniques  Considered  for  This  Thesis 

This  section  provides  a  brief  overview  of  each  of  the  techniques  used  in  this 
thesis.  It  begins  with  a  discussion  of  various  methods  for  dealing  with  missing  data.  This 
is  followed  by  an  introduction  to  Factor  Analysis  and  Discriminant  Analysis. 

2.4.1.  Missing  Data 

Invariably,  once  a  researcher  leaves  the  confines  of  a  controlled  environment  and 
begins  collecting  “real-world”  data,  he  will  be  confronted  with  the  problem  of  missing  or 
incomplete  data  (Allison,  2001:  1).  For  this  study,  165  variables  were  collected  on  242 
countries  covering  the  period  1995-2005.  However,  not  surprisingly,  almost  half  of  the 
data  were  missing.  This  section  discusses  some  of  the  common  tenninology  and 
techniques  used  to  address  the  issue  of  missing  data.  For  comparison’s  sake,  common 
techniques  found  in  the  scholarly  literature  are  reviewed,  though  not  all  were  utilized  in 
this  study.  Details  on  the  implementation  of  the  specific  methods  used  in  this  thesis  can 
be  found  in  Chapter  3. 

2.4.1. 1.  Terms  and  Definitions 

Missing  Completely  at  Random  (MCAR).  Data  are  said  to  be  missing  completely 
at  random  if  the  probability  of  encountering  missing  data  for  a  particular  variable  is  not 
dependent  on  the  value  of  that  variable,  or  any  other  in  the  dataset  (Allison,  2001:  3). 

This  would  be  the  case  if,  for  example,  the  probability  of  being  able  to  find  the  GNP  of  a 
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country  was  in  no  way  related  to  that  country’s  GNP  or  any  other  factor.  If  the 
probability  of  finding  a  country’s  GNP  increased  as  the  value  of  GNP  increased,  the  data 
would  not  be  MCAR.  When  data  are  MCAR,  the  observed  data  can  be  regarded  as  a 
random  sample  of  the  original  dataset  (Allison,  2001:  3). 

Missing  at  Random  (MAR).  Data  are  said  to  be  missing  at  random  if  the 
probability  of  data  missing  on  one  variable  is  independent  of  the  value  of  that  variable, 
once  all  other  variables  have  been  considered  (Allison,  2001:  4).  This  would  be  the  case 
if  GNP  was  more  likely  to  be  missing  for  smaller  countries,  but  within  the  groups  of 
smaller  and  larger  countries,  the  probability  of  GNP  missing  did  not  depend  on  the  value 
of  GNP  itself. 

Missing  Data  Mechanism  (MDM).  The  missing  data  mechanism  is  the  set  of 
parameters  that  describe  the  probability  structure  of  missing  data  (Allison,  2001:  5). 

Ignorable  MDM.  In  some  cases  the  missing  data  mechanism  can  be  ignored  when 
performing  analysis.  This  is  true  if  the  data  are  MCAR,  or  MAR  and  the  MDM  is 
unrelated  to  the  parameters  of  interest  (Allison,  2001 :  5).  Most  often,  MAR  implies 
Ignorable  MDM  (Allison,  2001:  5). 

Non-ignorable  MDM.  If  the  data  are  not  MAR,  the  MDM  should  be  included 
when  estimating  the  parameters  of  interest  as  it  contains  some  infonnation  about  the  true 
structure  of  the  data  (Allison,  2001:  5).  For  example,  since  it  may  be  valid  to  assume  that 
failing  or  failed  states  are  more  likely  to  experience  missing  data,  a  variable  called  “Data 
Availability”  was  added  to  our  dataset.  This  variable  was  equal  to  the  percentage  of  data 
filled  in  for  each  country  on  all  other  variables.  This  variable  could  capture  at  least  a 
portion  of  the  MDM. 


2-20 


2.4.I.2.  Common  Missing  Data  Techniques 


List-wise  Deletion.  List-wise  deletion  involves  removing  any  observations  from 
the  dataset  which  have  any  missing  values  (Allison,  2001:  6).  This  has  the  advantages  of 
being  easy  to  implement  and  leaving  the  analyst  with  a  dataset  containing  no  missing 
values,  but  potentially  discards  valuable  information  contained  in  the  incomplete  records. 
For  this  study,  no  country  was  populated  for  every  variable  for  every  year.  Therefore, 
list-wise  deletion  would  result  in  an  empty  dataset. 

Pair-wise  Deletion.  Pair-wise  deletion  is  less  wasteful  than  list-wise  deletion  in 
that  it  takes  each  variable  in  the  dataset  two  at  a  time  and  calculates  the  means,  standard 
deviations,  and  covariances  or  correlations  using  all  of  the  data  available  for  both 
variables  (Allison,  2001:  6).  Therefore,  while  list-wise  deletion  would  entirely  leave  out 
a  record  with  any  missing  data,  pair-wise  deletion  would  only  exclude  each  record  from 
calculations  involving  the  variable(s)  for  which  that  record  was  lacking  data.  However, 
pair-wise  deletion  produces  biased  standard  errors  and  test  statistics,  and  performs  worse 
than  list- wise  deletion  when  correlation  among  variables  is  high  (Allison,  2001:  9). 

Imputation.  Imputation  describes  any  method  for  replacing  missing  values  with 
some  logical  estimate  of  their  true  value  (Allison,  2001 :  12).  Three  more  common 
imputation  methods  are  Mean  Substitution,  Hot-Deck  Imputation,  and  Multiple 
Regression  (Chantala  and  Suchindran:  2006:  9),  which  are  described  next.  It  should  be 
noted  that  if  imputed  datasets  are  analyzed  with  no  account  for  the  uncertainty  in  the 
imputed  values,  results  will  appear  as  though  all  data  used  was  authentic.  This  means  the 
quality  of  the  standard  errors  and  test  statistics  will  be  artificially  inflated  (Chantala  and 
Suchindran:  2006:  9). 
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Mean  Substitution.  Mean  substitution  is  done  by  simply  imputing  missing  values 
with  the  mean  of  the  available  values  for  each  variable  (StatSoft,  2006:  n.pag.).  This 
method  leads  to  artificially  low  estimates  of  variance  and  standard  errors  and  should  be 
avoided  if  at  all  possible  (Allison,  2001 :  1 1). 

Multiple  Regression  Imputation.  Multiple  regression  imputation  uses  those 
variables  for  which  we  have  data  and  regresses  the  missing  variable  on  them  to  generate  a 
predicted  value  for  the  missing  data  (Allison,  2001 :  1 1).  For  example,  if  we  have  three 
variables  X,  Y,  and  Z,  with  30%  of  the  data  missing  for  variable  Z,  we  can  regress  Z  on 
X  and  Y  for  the  complete  cases,  and  use  the  resulting  equation  to  generate  predicted 
values  for  Z  in  the  incomplete  cases.  The  assumption  here  is  that  the  populated  variables 
are  good  predictors  of  the  incomplete  ones.  This  method  becomes  quite  complicated 
when,  as  in  our  case,  data  is  missing  on  more  than  one  variable  (Allison,  2001 :  1 1-12). 

Maximum  Likelihood.  Essentially,  the  method  of  maximum  likelihood  estimates 
population  parameter  values  by  selecting  those  which  would  maximize  the  likelihood  of 
the  observed  data  (Wackerly,  2002:  449).  Using  the  maximum  likelihood  parameters, 
missing  values  are  imputed  by  randomly  drawing  from  the  estimated  population 
distribution.  This  technique  relies  heavily  on  the  assumption  that  data  are  MCAR,  as 
imputed  values  for  a  given  variable  will  be  drawn  in  the  same  manner  for  each 
observation,  independent  of  any  of  the  other  variables  in  the  dataset. 

Hot-Deck  Imputation.  Hot  Deck  Imputation  refers  to  any  technique  in  which  the 
missing  values  are  replaced  by  actual  observed  values  for  some  other  record  in  the 
dataset,  as  opposed  to  means  or  random  draws  from  a  distribution.  One  benefit  of  such  a 
technique  is  that  we  are  guaranteed  to  replace  the  missing  data  with  a  feasible  value,  as  it 
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had  to  be  observed  originally  to  be  a  candidate.  This  prohibits  draws  from  outside  the 
true  range  of  the  variable  and  preserves  non-negativity  constraints.  Furthermore,  the 
nature  of  the  variable  such  as  categorical  or  binary  is  preserved.  This  is  similar  to  Mean 
Substitution,  but  rather  than  using  the  mean  of  the  entire  sample,  different  values  may  be 
used  to  replace  each  missing  value.  While  the  variation  in  such  a  dataset  will  still  be 
artificially  low,  it  will  be  greater  than  that  of  using  the  mean  for  every  imputation.  There 
are  multiple  ways  to  detennine  which  value  to  substitute  in  each  case.  One  such  method 
is  discussed  next. 

Nearest  Neighbor  Hot-Deck  Imputation.  Nearest  Neighbor  Imputation  (NNI)  is 
an  intuitive  and  easily  implemented  approach  to  addressing  the  issue  of  missing  data,  and 
is  the  technique  employed  in  this  thesis.  As  its  name  indicates,  NNI  seeks  to  find  the 
observation  in  the  dataset  most  similar  to  the  observation  for  which  some  data  is  missing. 
The  observation  containing  the  missing  value  is  called  the  recipient,  and  the  nearest 
neighbor  is  the  donor.  Note  that  the  roles  may  be  reversed  if  the  donor  is  missing  a 
different  value  for  which  the  recipient  is  populated.  For  the  purposes  of  this  thesis,  this 
assumes  that  countries  which  are  most  similar  according  to  a  relatively  large  number  of 
criteria  will  also  be  similar  in  those  areas  for  which  only  one  country  has  been  assessed. 
Chapter  3  provides  details  on  how  NNI  was  implemented  in  this  study. 

In  recent  years,  developments  in  computer  software  have  made  it  possible  to 
employ  missing  data  techniques  which  are  superior  to  those  discussed  earlier.  The 
techniques  themselves  are  not  new,  but  their  computational  feasibility  has  recently 
become  a  reality  (Allison,  2001:  2).  Multiple  Imputation  with  Data  Augmentation  is  one 
such  technique.  Unfortunately,  software  constraints  prohibited  us  from  utilizing  these 
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methods;  they  are  included  in  Chapter  5  as  a  possible  area  for  improving  on  this  effort 
through  additional  analysis. 

2.4.I.3.  Summary  of  Missing  Data  Discussion 

Missing  data  is  an  issue  that  commonly  occurs  in  the  social  science  and  other 
fields.  While  no  technique  could  possibly  perform  better  than  actually  finding  the 
missing  values,  several  techniques  for  dealing  with  missing  data  are  available.  For  this 
thesis,  Nearest  Neighbor  Imputation  (NNI)  was  selected  for  its  intuitive  approach  and 
ease  of  use.  Chapter  3  provides  details  on  the  implementation  of  NNI. 

2.4.2.  Factor  Analysis 

Factor  Analysis  (FA)  is  a  technique  used  to  reduce  the  dimensionality  of  a  set  of 
data  and  explore  relationships  among  its  variables  (Lattin  et  al,  2003:  127).  It  does  this 
by  grouping  variables  whose  previous  relationship  was  unknown  through  identifying 
underlying  dimensions,  or  factors,  reflected  in  those  variables  (Dillon  and  Goldstein, 
1984:  53).  The  result  of  FA  is  often  the  identification  of  a  small  set  of  “unobservable 
characteristics”  which  explain  a  great  deal  of  the  infonnation  and  variation  present  in  the 
much  larger  dataset  (Lattin  et  al,  2003:  128-129).  Readers  familiar  with  Principal 
Components  Analysis  (PCA)  will  find  FA  similar  in  solution  methodology,  though 
slightly  different  in  underlying  assumptions  and  interpretation.  For  our  purposes,  we 
assume  that  the  observable  data  we  have  available  are  actually  measurements  of  some 
unobservable  characteristics  such  as  State  Stability.  FA  is  designed  to  uncover  the 
concepts  or  ideas  that  are  truly  of  interest,  but  may  not  be  directly  measurable.  Such  a 
technique  can  be  valuable  for  the  purpose  of  understanding  and  condensing  the  myriad  of 
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data  available  on  the  nations  of  the  world.  This  section  provides  an  overview  of  FA  as 
well  as  how  it  can  be  used  for  the  purposes  of  assessing  and  comparing  nations  and 
identifying  failing  states. 

2.4.2. 1.  Overview  of  Factor  Analysis 

When  using  FA,  we  assume  that  the  variation  in  a  variable  within  a  dataset  comes 
from  two  sources;  the  variance  unique  to  that  variable,  and  the  variation  that  is  common, 
or  shared,  among  the  several  variables  reflecting  a  common  underlying  factor  (Dillon  and 
Goldstein,  1984:  56).  For  this  reason,  FA  is  often  referred  to  as  Common  Factor 
Analysis.  The  original  indicator  variables  are  then  thought  of  as  functions  of  these 
unobservable  common  factors  (Dillon  and  Goldstein,  1984:  57).  The  purpose  of  FA  is  to 
identify  this  relatively  small  number  of  underlying  characteristics  and  use  these,  or  a 
subset  of  the  original  variables  heavily  loaded  on  them,  as  a  substitute  for  the  complete 
dataset  for  analysis  and  prediction. 

The  purpose  and  benefit  of  FA  can  be  better  understood  through  an  example. 
Section  5.3  of  Lattin  et  a/.,  2003  recounts  a  1991  study  investigating  consumers’ 
preference  in  breakfast  cereals  (Lattin  et  al,  2003:  147-153).  Consumers  rated  12  brands 
of  cereal  on  each  of  25  attributes.  The  purpose  of  the  study  was  to  predict  which  brands 
consumers  would  be  more  likely  to  purchase  by  considering  a  smaller  number  of 
underlying  characteristics  (much  fewer  than  25)  represented  in  their  ratings.  The 
researchers  used  Factor  Analysis  to  identify  common  factors  which  account  for  the 
majority  of  the  variance  in  the  original  data.  (Lattin  et  al,  2003:  148).  Figure  2-3  presents 
a  visual  representation  of  their  results. 
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Figure  2-3:  Results  of  Factor  Analysis 


Lattin’s  research  team  found  that  the  original  25  variables  could  be  represented  to 
a  large  extent  by  four  underlying  dimensions  which  they  chose  to  label  “Healthful”, 
“Artificial”,  “Non-Adult”  and  “Interesting”.  Each  variable  is  presented  as  a  member  of 
the  group  of  variables  comprising  each  factor.  Variables  in  parenthesis  are  negatively 
correlated  with  the  factor  scores,  meaning,  for  example,  that  as  sogginess  of  cereal 
increases,  its  “Interesting”  score  decreases  (Lattin  et  al,  2003:  153).  Their  choice  of 
factor  labels  may  be  debatable,  however  the  benefit  of  the  FA  is  clear:  rather  than 
needing  to  consider  all  25  original  attributes,  we  may  be  able  to  understand  consumers’ 
cereal  preferences  by  looking  at  only  the  four  latent  factors  reflected  in  those  variables. 

It  is  important  to  note  that  the  common  factors  do  not  represent  a  mutually 
exclusive,  collectively  exhaustive  representation  of  the  original  data.  In  other  words,  the 
underlying  factors  may  overlap  as  shown  in  Figure  2-3  in  the  sense  that  attributes  which 
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are  most  associated  with,  or  loaded  on,  one  factor  may  also  reflect  variation  in  another. 

In  addition,  the  four  factors  retained  in  Lattin’s  study  only  account  for  52%  of  the  total 
variation  in  the  original  data  (Lattin  et  al,  2003:  149). 

If  an  analysts  desires,  he  may  retain  more  common  factors  to  account  for  more  of 
the  variation  in  the  data.  It  is  to  some  degree  an  art  form  deciding  how  many  common 
factors  are  sufficient  to  characterize  the  data.  However,  it  is  generally  acceptable  to  only 
retain  common  factors  which  “account  for  at  least  as  much  variation  as  one  of  the 
original  variables  in  the  analysis. . .”  (Lattin  et  al,  2003:  148).  Once  the  amount  of 
variation  explained  by  a  factor  falls  below  that  of  the  original  variables,  it  ceases  to 
provide  a  significant  improvement  to  the  model. 

2.4.2.2.  Using  Factor  Analysis  to  Identify  Failing  States 

Recall  that  two  primary  objectives  of  Factor  Analysis  are  to  reduce  the 
dimensionality  of  a  dataset  and  explore  relationships  or  commonalities  among  the 
different  variables  (Lattin  et  al,  2003:  127,  Dillon  et  al  1984:  53).  For  the  purpose  of 
predicting  failing  states,  if  we  were  somehow  able  to  produce  a  single  yardstick  with 
which  we  could  measure  the  probability  of  a  state  collapsing,  we  would  have  no  need  for 
exploratory  analysis.  As  described  in  the  first  portion  of  this  chapter  however,  there  is  no 
single  agreed  upon  measure  which  can  predict  with  any  degree  of  accuracy  when  and  if  a 
state  will  decline  into  failure.  On  the  contrary,  there  are  numerous  studies  on  the  subject, 
all  of  which  cite  different  variables  as  indicators  of  impending  state  failure.  Table  2-1 
shows  a  list  of  variables  experts  use  for  the  purpose  of  predicting  states  in  crises.  This 
list  is  not  intended  to  be  exhaustive,  but  rather  a  compilation  of  variables  encountered  in 
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the  scholarly  literature.  The  bibliography  includes  more  detailed  citations.  This  table 
includes  many  variables  used  to  predict  crisis  or  state  failure.  Often,  variables 
hypothesized  to  be  useful  for  such  predictions  are  chosen  as  much  on  the  basis  of  the 
availability  of  the  data  as  the  true  expectation  of  their  importance.  Once  again,  if  an 
accurate  attribute  “Probability  of  Collapse”  was  readily  available  for  each  state,  further 
study  would  not  be  necessary.  As  it  is,  we  are  left  to  do  what  we  can  with  the  data  that  is 
available.  In  this  thesis  we  have  used  Factor  Analysis  to  gauge  the  immeasurable 
Probability  of  Collapse  or  Crisis  Level,  as  they  are  reflected  in  the  myriad  of  variables 
currently  available. 
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Table  2-1:  Variables  Considered  by  Subject  Matter  Experts 
for  Identifying  Nations  in  Crisis 


Repressive  Government  ‘A4-5.6.7.9.10.11.13 

GNP/GNI  -  Per  Capita  4,12 

Ongoing/Recent  Violent  Conflicts  1’2,4’5’8’9’ 10,1 3  Imports  ($)  1,3 

Representative  Government  2-4-5'6>7-9-1U3’14 

Imports  other  than  Money  1,3 

Stable  Government 'AWUO.h.b 

Private  Armies  1,8 

Cultural  Factions  1-2’5'6-7-10’11 

Proximity  to  "Bad  Neighbors" 2,7 

Economic  Trade  1’4’5-6’8’12-14 

Refugee/Displaced  Population  4,13 

Infant  Mortality  Rate  3’4,5,12’14 

Rising  Standard  of  Living  1,2 

Legitimate  National  Security  Force 

Stable  Infrastructure  2,6 

Personal  Freedom  ,4,5,6 

Useable  Land  and  Land  Disputes  2,13 

Religious  Factions  1’2,5,6,7 

Water/Sanitation  3,13 

Corrupt  Government 1,2,8,10 

Annual  Food  Production  Change  4 

GDP  -  Per  Capita  4,5,9,14 

Battle  Deaths  4 

GDP  Growth  Rate  4,9,13,14 

C02/CFC  Emissions  3 

IMF  Involvement  -  Foreign  Aid  12  3  4 

Condom  Use  3 

Poverty  Rate  1,3,7,13 

Consumer  Price  Index  4 

Publicly  Accepted  Constitution  1  2,6,8 

Exports  other  than  Oil  1 

Unemployment  Rate  1,2,3,13 

Forest  Area  3 

Unmet  War  Goals  7,8,10 

Former  British  Colony  9 

Caloric  Intake  3,4,5 

Immigration  2 

Education  2,3,13 

Income  Inequality  (GINI)  4 

Life  Expectancy  1,5,12 

Inflation  2 

Literacy  2,3,12 

Maternal  Mortality  Rate  3 

Mass  Murder  Rate  1,4,6 

Military  Spending  -  %  of  GNP  4 

Population  Growth  Rate  2,9,13 

Murder  Rate  1 

Ratio  of  Population  age  15-29  /  30-54  3,5 

Per  Capita  Income  1 

AIDS  1,3 

Pollution  1 

Crime  Rate  1,8 

Population  9 

Disease  prevalence  -  Immunization  1,3 

Recognized  Stable  Borders  10 

Drug  Traffic  1,7 

Suicide  Rate  1 

Energy  Demand  13 

Telecommunications  1 

Equal  Rights  for  Women  2  3 

Terrorism  Incidents/Population  1 

Exports  ($)  1,3 

Tuberculosis  3 

Free  Press  2,11 

Underweight  Children  3 

Globalization  Factor  12 

US  military  Intervention  1 

Sources:  (see  bibliography  for  detailed  citations) 

1  Barnett  (2005)  Blueprint  for  Action  8  Dziedzic  et  al.  (2005)  Quest  for  Viable  Peace 

2  Brown  (2001)  Causes  of  Internal  Conflict  9  Poe  et  al.  (2006)  Journal  of  Conflict  Resolution 

3  UN  Millennium  Development  Goals  10  Van  Evera  (2001)  Hypotheses  on  Nationalism  and  War 

4  Nafziger  et  al.  (1997)  War  Hunger  and  Displacement  1 1  Snyder  et  al.  (2001)  Nationalism  and  the  Marketplace  of  Ideas 

5  O’Brien  (2002)  Forecasting  Country  Instability  12  Stewart  (2002)  Root  Causes  of  Conflict  in  Developing  Countries 

6  DoD  Directive  3000.05  (2005)  13  Rotberg  (2004)  When  States  Fail:  Causes  and  Consequences 

7  Durch  (2002)  Briefing  to  CJCS  14  Esty  et  al.  (2006)  State  Failure  Task  Force  Report 
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2.4.3.  Discriminant  Analysis 

Discriminant  Analysis  (DA)  is  a  technique  used  to  partition  a  set  of  subjects  into 
two  or  more  disjoint  groups  (Lattin  et  al,  2003:  426,  Dillon  et  al,  1984:  360).  It  does  this 
by  using  infonnation  captured  in  a  set  of  independent  variables  to  create  the  clearest 
possible  separation  among  the  subjects,  and  assigning  them  to  their  most  likely  group 
(Lattin  et  al,  2003:  426). 

The  primary  objective  of  DA  is  to  classify  or  discriminate  among  individuals  or 
objects  (Dillon  et  al,  1984:  360).  If  we  know  what  distinguishes  members  of  various 
groups  from  one  another,  we  may  use  this  knowledge  to  assign  new  subjects  to  groups,  or 
to  predict  future  events  based  on  a  historical  record  of  behavior  of  members  of  a  certain 
group  (Dillon  et  al,  1984:  363-4).  In  essence,  this  definition  of  Discriminant  Analysis  is 
equivalent  to  political  or  social  discrimination,  which  is  assigning  often  intangible  traits 
or  expectations  to  people  based  on  measurable  qualities  such  as  race  or  gender. 

The  goal  of  this  thesis  is  to  assist  analysts  in  determining  which  states  are  most 
likely  to  fail  or  erupt  into  violence  unless  some  intervention  occurs.  Therefore,  if  we 
consider  “failing  states”,  “borderline  states”,  and  “stable  states”  as  three  mutually 
exclusive,  collectively  exhaustive  groups  into  which  all  nations  can  be  classified,  DA 
may  be  a  useful  tool  for  this  research.  The  idea  is  to  find  the  linear  combination  of  the  set 
of  independent  variables  collected  on  all  countries  that  produces  the  largest  possible 
distinction  between  the  three  classifications  (Lattin  et  al,  2003:  429).  Once  this 
relationship  is  determined,  the  linear  combination  can  be  used  to  classify  previously 
unanalyzed  states,  or  the  same  states  at  various  points  over  time.  Chapter  3  explains  how 
DA  was  implemented  in  this  study;  the  reader  is  also  referred  to  Analyzing  Multivariate 
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Data  (Lattin  et  al,  2003)  or  Multivariate  Analysis  (Dillon  et  al,  1984)  for  detailed 
discussion  of  the  uses,  assumptions,  and  mechanics  of  DA. 

2.4.4.  Summary  of  Techniques 

The  majority  of  recent  work  in  predicting  failing  states  is  centered  on  using 
Ordinary  Least  Squares  Regression,  sometimes  with  variable  transfonnations.  However, 
such  efforts  require  a  well-defined  and  quantifiable  crisis  level  to  be  known  for  each  state 
in  advance  of  model  construction,  and  defining  such  a  value  has  proven  quite  difficult  for 
researchers.  In  contrast,  Exploratory  Factor  Analysis  can  reduce  the  dimensionality  of 
currently  available  data  and  characterize  the  immeasurable,  underlying  factors  reflected 
in  it.  Correlations  between  these  factors  and  the  initial  variables  can  identify  which 
subset  of  variables  captures  a  significant  amount  of  the  total  information.  Discriminant 
Analysis  can  then  be  performed  on  the  full  or  reduced  dataset  in  order  to  classify 
countries  into  similar  groups,  and  comparisons  of  these  groups  with  those  provided  by 
subject  matter  experts  will  serve  to  identify  the  most  critical  nations.  These  techniques 
are  not  proposed  in  order  to  refute  other  models  found  in  the  literature,  but  rather  to 
augment  or  validate  ongoing  crisis  prevention  efforts. 

2.5.  Chapter  Summary 

Many  analysts  and  organizations  are  attempting  to  predict  failed  or  failing  states 
using  mathematical  modeling.  The  reasons  for  each  study  vary,  but  from  a  Department 
of  Defense  standpoint,  failing  states  provide  safe  havens  and  recruitment  pools  for 
terrorist  organizations.  It  is  our  national  interests  to  identify  failing  states  and,  if 
possible,  prevent  them  from  collapsing.  This  thesis  assists  in  that  endeavor  using 
appropriate  analytical  techniques. 
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3.  Methodology 


3.1.  Introduction 

This  chapter  describes  the  analysis  techniques  used  in  this  study  to  identify 
unstable  or  failing  states.  It  begins  with  a  description  of  how  variables  were  chosen  for 
initial  consideration.  Next,  it  discusses  the  methods  used  to  reduce  and  refine  the  set  of 
selected  variables,  and  for  dealing  with  missing  data.  The  chapter  concludes  with  an 
explanation  of  the  process  used  to  select  variables  most  critical  for  the  purposes  of 
identifying  unstable  nations,  and  classifying  states  as  critical  or  failing. 

3.2.  Data 

A  primary  purpose  of  this  study  is  to  identify  a  relatively  small  set  of  key 
variables,  available  through  open  source,  which  could  be  used  to  classify  states  in  terms 
of  their  overall  stability.  These  variables  may  then  be  used  to  determine  which  states  are 
most  likely  to  fail,  or  otherwise  experience  some  form  of  crisis.  As  explained  in  Chapter 
2,  various  subject  matter  experts  (SME)  and  organizations  use  a  myriad  of  variables  for 
just  such  a  purpose.  This  section  describes  our  methods  for  building  our  initial  dataset, 
and  the  preliminary  steps  used  to  reduce  the  number  of  variables.  Our  technique  for 
dealing  with  missing  data  is  also  discussed. 

3.2.1.  Initial  Dataset 

Our  data  collection  began  by  exploring  the  scholarly  literature  for  any  variables 
SME  considered  relevant  for  the  purpose  of  identifying  failing  states.  A  list  of  these 
variables  served  as  a  minimum  amount  of  data  to  be  analyzed  in  this  study.  In  addition  to 
the  SME  identified  variables,  any  other  variables  encountered  during  the  search,  and  any 
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we  could  reasonably  expect  to  contribute  to  the  model,  were  included  in  our  initial 
dataset  as  well.  Table  2-1  provides  a  summary  of  SME  variables  used,  and  Appendix  A 
for  a  list  of  all  variables  collected. 

In  all,  167  variables  were  collected  for  consideration.  However,  several  of  these 
variables  were  not  unique.  For  example,  Gross  Domestic  Product  (GDP)  Per  Capita  was 
collected  from  both  the  World  Bank  and  United  Nations  Statistic  Division.  Both  were 
included  in  the  initial  dataset  for  two  main  reasons.  First,  since  data  was  collected 
indirectly  through  open  source,  not  directly  by  our  team,  comparing  several  like  variables 
could  serve  to  validate  the  data  and  improve  confidence  in  it.  Second,  as  data  needed  to 
be  copied,  pasted,  moved  and  in  some  cases  transfonned  or  calculated  several  times 
throughout  the  database  construction  process,  two  previously  identical  variables  suddenly 
appearing  different  would  signal  that  a  computation  error  may  have  been  made.  In  other 
words,  retaining  two  identical  variables  from  different  sources  served  as  an  error  check 
for  both  our  work,  and  the  various  agencies  from  which  our  data  was  drawn. 

Clearly,  167  variables  could  not  possibly  encapsulate  everything  about  the  status 
of  a  country.  Fensterer,  compiling  the  work  of  a  number  of  authors,  outlines  44  broad 
categories  for  assessing  state  stability  (Fensterer,  2007).  Each  of  these  categories,  for 
example  Judicial  Effectiveness,  may  require  dozens  of  variables  collected  over  time  to 
truly  gain  a  perspective  on  each  nation’s  status.  However,  there  is  a  large  gap  between 
what  is  needed  to  fully  assess  state  stability  and  what  is  currently  being  collected  and 
made  available.  Table  3-1  divides  the  multinational  data  into  four  main  categories  based 
on  a  specific  variable’s  importance  in  assessing  state  stability  and  the  level  at  which  the 
data  is  available  from  open  sources. 
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Table  3-1:  Data  Collection  Focus 


Importance  in  Predicting  State  Failure 

Availability 

Significant 

Insignificant 

Collected  and 
Readily  Available 

Most  useful  in  predicting 
failing  states.  Collection 
efforts  should  be  continued 
and  refined  as  necessary. 

Marginal  benefit  gained.  Data 
collection  resources  may  be 
available  to  drawn  upon  for 
other  purposes. 

Not  Currently 
Collected 

Intelligence  requirement. 
Should  develop  metrics  and 
begin  data  collection  for 
comprehensive  assessment. 

Initiating  collection  efforts 
unlikely  to  provide  additional 
benefit. 

The  focus  of  this  thesis  is  to  consider  all  currently  available  open  source  variables  and 
detennine  which  are  truly  significant  for  our  purposes.  The  reader  is  encouraged  to  refer 
to  Fensterer’s  work  and  the  Iraq  Study  Group  Report  published  in  November,  2006  for 
analysis  of  the  types  of  variables  which,  if  collected,  may  provide  an  even  more 
comprehensive  assessment  of  state  stability. 

Once  the  initial  dataset  was  finalized,  the  process  of  data  reduction  began. 

Recall  that  one  of  the  hypotheses  of  this  thesis  is  that  adequate  classification  of  failing 
states  is  possible  with  as  few  as  ten  open-source  variables.  The  next  section  describes  the 
first  step  in  moving  toward  that  goal. 

3.2.2.  Reduced  Dataset 

Our  initial  dataset  represented  a  collection  of  data  in  three  major  dimensions: 
countries,  years,  and  variables.  We  collected  data  on  242  countries  from  1995-2005 
across  167  variables.  Unfortunately,  only  about  51%  of  the  database  was  populated. 

This  section  describes  the  methods  use  to  generate  a  single,  fully-populated  dataset  for 
further  analysis. 
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3.2.2. 1.  Removal  of  Countries 


Of  the  242  countries  comprising  our  initial  study,  42  were  removed  on  the  basis 
of  their  lack  of  available  data,  or  the  fact  that  the  country  no  longer  exists.  None  of  these 
countries’  records  were  populated  with  more  than  20%  of  the  variables  in  the  dataset. 
Many  were  small  island  states  or  recent  protectorates.  It  is  our  contention  that  the 
removal  of  these  countries  did  not  materially  impact  the  usefulness  of  this  study;  however 
the  omitted  countries  are  provided  here  for  informational  purposes. 


Table  3-2:  Countries  Removed  From  Study 


Abkhazia 

Glorioso  Islands 

Reunion 

Akrotiri 

Greenland 

Saint  Helena 

American  Samoa 

Guadeloupe 

Saint  Pierre  and  Miquelon 

British  Virgin  Islands 

Holy  See 

South  Georgia 

Channel  Islands 

Isle  of  Man 

Spratly  Islands 

Christmas  Island 

Martinique 

Taiwan 

Cocos  (Keeling)  Islands 

Mayotte 

Tokelau 

Cook  Islands 

Monaco 

Turks  and  Caicos  Islands 

Dhekelia 

Montenegro 

Tuvalu 

Europa  Island 

Montserrat 

US  Virgin  Islands 

Falkland  Islands 

Nauru 

Wallis  and  Futuna 

Faroe  Islands 

Niue 

West  Bank  (Combined  w/  Gaza  Strip) 

French  Guiana 

Northern  Mariana  Islands 

Western  Sahara 

Gibraltar 

Pitcairn  Islands 

Yugoslavia 

If  subject  matter  experts  detennined  that  some  or  all  of  these  nations  were  of  political 
interest,  the  methodology  proposed  here  could  be  utilized,  provided  requirements  were 
created  to  collect  the  necessary  data. 


3.2.2.2.  Reduction  in  the  Number  of  Years  Analyzed 

Several  variables,  such  as  Net  Migration  Rate,  have  historically  only  been 
collected  every  few  years.  In  addition,  even  if  a  data  collection  agent  collects  data 
annually,  not  every  country  is  assessed  every  year.  For  this  reason,  a  look  at  any 
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individual  year  would  result  in  no  less  than  44%  missing  data,  which  is  the  amount 
missing  from  the  2004  dataset.  Of  course,  if  the  data  were  never  collected,  it  is  virtually 
impossible  to  go  back  and  accurately  and  directly  fill  in  the  missing  values.  However, 
there  are  methods  for  dealing  with  such  an  issue. 

The  focus  of  this  thesis  is  on  the  identification  of  key  variables  that  may  be  used 
to  identify  failing  states.  The  underlying  assumption  here  is  that  the  relationship  among, 
and  the  importance  of,  each  of  these  key  variables  remains  relatively  constant  over  time. 
Therefore,  for  this  initial  analysis,  the  most  recent  data  available  for  each  country  was 
used.  This  had  the  effect  of  reducing  the  amount  of  data  we  would  need  to  impute  to  less 
than  20%.  However,  as  is  recommend  in  Chapter  5,  it  would  certainly  be  useful  to 
investigate  a  process  for  imputing  data  which  is  missing  for  some,  but  not  all,  years. 

Such  a  technique  would  allow  time-series  analysis  of  the  data,  which  could  prove  more 
useful  for  prediction,  as  opposed  to  classification,  of  failing  states. 

3.2.2.3.  Variable  Reduction 

As  mentioned  earlier,  several  of  the  167  initial  variables  were  essentially  identical 
in  that  they  measured  exactly  or  nearly  the  same  thing,  but  were  possibly  collected  by 
different  sources,  perhaps  using  differing  methods.  However,  even  two  distinct  variables 
can  be  redundant.  Multicollinearity  occurs  when  one  or  more  of  the  independent 
variables  are  correlated.  If  the  correlation  is  high  or  a  combination  of  variables  are 
linearly  dependent,  the  variables  are  capturing  related  variance.  It  can  also  occur  if  one 
of  the  variables  is  close  to  a  linear  combination  of  one  or  more  of  the  others.  With  the 
large  number  of  variables  in  our  initial  dataset,  a  high  degree  of  multicollinearity  is 


3-5 


virtually  guaranteed.  Therefore,  the  first  step  in  reducing  the  dataset  was  to  remove  at 
least  one  of  every  two  variables  shown  to  be  highly  correlated. 

Before  continuing  on,  several  comments  on  the  necessity  of  this  step  are  in  order. 
As  described  in  Chapter  2,  Factor  Analysis  is  our  preferred  method  for  reducing  the 
dimensionality  of  a  dataset.  It  has  the  benefit  of  being  robust  to  multicollinearity,  and 
incorporates  information  from  each  variable,  minimal  though  it  may  be.  However,  there 
are  several  reasons  to  reduce  the  number  of  variables  prior  to  performing  FA. 

First,  Appendix  A  shows  a  gross  measure  of  data  availability  for  each  of  the 
initial  variables.  The  percentage  shown  is  the  proportion  of  countries  for  which  at  least 
one  value  was  collected  on  that  variable  between  1995  and  2005.  Because  of  the 
significant  amount  of  missing  data  in  our  dataset,  values  would  need  to  be  imputed  in 
order  to  accomplish  some  of  the  other  techniques  used  later  such  as  Factor  Analysis  and 
Discriminant  Analysis.  However,  each  time  data  is  imputed,  some  amount  of  additional 
uncertainty  is  generated.  This  uncertainty  biases  the  model  and  is  not  directly  reflected  in 
the  results.  Therefore,  if  we  are  able  to  propose  reasonable  justification  for  discarding 
variables  with  significant  amounts  of  incomplete  data,  unnecessary  uncertainty  can  be 
avoided. 

Second,  most  data  imputation  techniques,  such  as  the  Nearest  Neighbor  method 
used  in  this  study,  assume  the  data  is  normally  distributed,  and  provide  better  substitute 
values  if  that  assumption  is  met.  Furthermore,  when  using  Linear  Discriminant  Analysis, 
optimal  results  are  only  attained  if  the  independent  variables  are  normally  distributed 
(Dillon  et  al,  1984,  379).  Therefore,  each  variable  included  during  these  steps  would 
need  to  be  checked  individually  for  normality  prior  to  analysis,  and  transformed  as 
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appropriate.  Obviously,  provided  no  material  information  would  be  lost,  a  smaller  set  of 
variables  to  examine  and  test  was  desirable. 

Finally,  while  Factor  Analysis  does  reduce  the  dimensionality  of  the  data,  to  truly 
understand  the  underlying  structure  the  analyst  must  characterize  each  of  the  latent 
factors.  This  is  typically  done  by  considering  which  variables  are  most  heavily  loaded  on 
each  of  the  factors  as  seen  in  the  cereal  example  in  Chapter  2.  This  process  is 
considerably  more  involved  when  an  extremely  large  number  of  variables  is  used. 

For  these  reasons,  we  examined  the  correlation  between  each  pair  of  variables  in 
the  initial  dataset.  The  Pearson  correlation  coefficient  of  any  two  variables  x  and  y 
represents  the  degree  of  linear  dependence  between  the  two  and  can  be  computed  by 

(x  -  x)(  y  -  y) 

^(x-x)2^(y-y)“ 

and  -1.0  <  p  <  1.0.  Values  close  to  +1  suggest  high  positive  correlation,  meaning  as  x 
increases,  y  increases.  Values  close  to  -1  indicate  negative  correlation  in  which  as  x 
increases,  y  decreases.  Values  of  p  near  zero  indicate  no  correlation  (Wackerly  et  al 
2002:251). 

A  portion  of  the  resulting  correlation  matrix  is  shown  in  Table  3.3,  with 
significant  correlation  values  highlighted. 
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Table  3-3:  Sample  from  Variable  Correlation  Matrix 


A110 

A111 

A112 

A113 

A114 

A115 

A116 

A117 

A118 

A119 

A131 

0.051 

0.586 

-0.631 

0.271 

0.355 

0.030 

-0.209 

-0.020 

-0.017 

0.350 

A132 

0.419 

0.599 

-0.613 

0.105 

0.165 

0.237 

-0.282 

-0.103 

-0.126 

0.835 

A133 

0.046 

0.010 

0.041 

-0.043 

-0.114 

0.154 

0.020 

0.002 

-0.035 

0.127 

A134 

0.689 

0.351 

-0.385 

0.074 

0.120 

0.095 

-0.169 

-0.092 

-0.006 

0.341 

A135 

0.080 

0.143 

-0.186 

-0.088 

0.036[ 

0.815 

-0.321 

-0.142 

-0.205 

0.211 

A136 

-0.176 

-0.152 

0.135 

0.028 

-0.053 

-0.121 

0.214 

0.148 

0.153 

-0.139 

A137 

0.045 

0.437 

-0.430 

0.086 

0.172 

-0.159 

0.089 

0.037 

0.181 

0.472 

A138 

-0.116 

0.429 

-0.358 

0.208 

0.091 

-0.171 

0.164 

0.299 

0.042 

0.134 

A139 

-0.165 

-0.712 

0.823 

-0.272 

-0.369 

-0.202 

0.268 

0.013 

0.064 

0.022 

AMO 

0.203 

0.789 

-0.798 

0.212 

0.309 

0.251 

-0.337 

-0.128 

-0.144 

0.765 

A141 

0.064 

-0.486 

0.516 

-0.140 

-0.320 

-0.050 

0.136 

-0.068 

-0.025 

-0.314 

A142 

-0.086 

-0.045 

-0.196 

-0.204 

-0.141 

0.113 

-0.229 

-0.110 

-0.068 

0.031 

A143 

0.174 

0.656 

-0.630 

0.213 

0.229 

0.135 

-0.228 

-0.086 

-0.079 

0.973 

A144 

-0.189 

-0.051 

-0.121 

0.020 

0.025 

0.124 

0.178 

0.161 

0.303 

0.041 

A145 

0.207 

0.669 

-0.637 

0.212 

0.233 

0.148 

-0.226 

-0.107 

-0.060 

0.961 

A146 

0.298 

0.760 

-0.730 

0.231 

0.283 

0.223 

-0.263 

-0.117 

-0.059 

0.934 

A147 

0.213 

0.627 

-0.573 

0.175 

0.226 

0.067 

-0.183 

-0.079 

-0.040 

0.935 

A148 

0.067 

0.485 

-0.648 

-0.168 

0.198 

0.254 

-0.275 

-0.115 

-0.154 

0.403 

A149 

0.126 

0.769 

-0.788 

0.372 

0.373 

0.176 

-0.249 

-0.130 

-0.054 

0.477 

Note  that  variable  names  were  coded  for  space  and  software  integration  purposes. 
Appendix  A  contains  a  complete  listing  of  variables  and  codes.  From  the  matrix  we  see, 
for  example,  that  variable  A1 19,  which  is  GDP  Per  Capita  appears,  to  be  highly 
positively  correlated  with  several  variables  including  A 132,  Electric  Power  Consumption. 
In  addition,  A 1 1 1 ,  Caloric  Intake  is  negatively  correlated  with  A 139,  Number  of  Births 
per  Woman.  Other  correlations  are  less  direct,  such  as  the  high  positive  correlation 
between  A 1 1 1  and  A140,  Caloric  Intake  versus  Number  of  People  per  1,000  with  Fixed- 
line  or  Mobile  Phones. 

Values  above  0.7  (or  below  -0.7)  indicated  that  at  least  70%  of  the  variation  in 
one  variable  could  be  represented  by  the  other.  Therefore,  at  least  one  of  these  variables 
was  removed.  For  cases  in  which  several  variables  were  all  highly  correlated  with  each 
other,  only  one  variable  was  retained.  The  0.7  cutoff  was  a  subjective  choice.  Figure  3-1 
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shows  the  number  of  correlations  present  as  a  function  of  the  cutoff  value  chosen.  As  the 
threshold  for  significant  correlation  decreases,  the  number  of  variables  to  be  discarded 
grows  exponentially. 


Figure  3-1:  Number  of  Correlations  as  a  Function  of  Threshold 
In  deciding  an  appropriate  value  to  use  as  a  cutoff,  we  needed  to  strike  a  balance 
between  the  gains  and  losses  associated  with  removing  variables  from  our  model. 
Removing  superfluous  data  would  significantly  decrease  computation  and  interpretation 
time  in  later  steps,  and  limit  the  uncertainty  introduced  during  imputation.  On  the  other 
hand,  prematurely  disregarding  variables  may  result  in  a  loss  of  important  information. 
In  addition,  FA  will  be  used  to  further  reduce  the  number  of  variable  dimensions,  so  it  is 
not  necessary  to  remove  every  instance  of  multicollinearity  during  this  step.  In  order  to 
retain  as  much  information  as  possible,  it  would  be  better  to  err  on  the  side  of  choosing 
too  high  a  threshold  for  removal.  The  0.7  threshold  seems  to  have  provided  the 
appropriate  balance,  as  evidenced  by  the  number  and  breadth  of  variables  retained. 
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Using  0.7  as  a  threshold,  the  number  of  variables  carried  forward  was  reduced 
from  167  to  60.  In  almost  all  cases,  each  variable  removed  was  still  represented  in  the 
reduced  dataset  not  only  by  at  least  one  highly  correlated  variable,  but  also  one  that 
seemed  to  be  an  intuitive  proxy  for  the  removed  variable.  For  example,  Pupil-Teacher 
Ratio  -  Primary  Level  was  replaced  by  Pupil-Teacher  Ratio  -  Secondary  Level.  The  60 
remaining  variables  still  represented  a  much  greater  number  than  any  model  found  in  the 
literature,  which  further  suggested  that  the  0.7  value  was  not  too  low. 

When  a  discrepancy  existed  in  the  amount  of  data  available  for  two  highly 
correlated  variables,  the  variable  with  more  missing  data  was  discarded.  This  approach 
improved  the  average  percentage  of  available  data  per  retained  variable  from  81%  to 
85%,  and  the  variance  in  the  amount  of  data  present  decreased  by  18%.  The  list  of 
variables  retained  for  the  reduced  dataset  can  be  found  in  Appendix  B. 

3.2.3.  Variable  Transformation 

Once  the  variables  to  be  used  for  the  remainder  of  the  study  were  selected,  the 
distribution  of  each  variable  was  examined  for  gross  violations  of  the  normality 
assumption.  If  a  variable  did  not  appear  at  least  approximately  normally  distributed,  as 
discussed  earlier  we  could  transform  it  to  improve  the  results  of  the  data  imputation  and 
Discriminant  Analysis.  The  following  is  in  example  of  the  variable  transformation 
process  used  in  this  study. 

On  of  the  simplest  ways  to  check  for  normality  in  a  variable  is  simply  to  plot  a 
histogram  of  the  data.  Figure  3-2  shows  a  histogram  of  the  Population  Density  of  the 
nations  of  the  world  in  thousands  of  people  per  square  mile. 
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Figure  3-2:  Histogram  of  Population  Density  Data  Before  Transformation 


Clearly,  the  data  do  not  appear  nonnal.  More  formally,  three  common  Goodness- 
of-Fit  (GOF)  tests  can  be  run  on  the  data  to  see  if  the  hypothesis  that  the  data  do  come 
from  a  normal  distribution  can  be  rejected.  They  are  the  Chi-Squared,  Anderson-Darling, 
and  the  Kolomogorov-Smirinov  tests. 
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The  Chi-Squared  Statistic  is  the  most  popular  Goodness  of  Fit  Test.  It  separates 
data  into  K  groups  or  bins  and  compares  the  number  of  entries  in  each  bin  with  the 
number  one  would  expect  to  be  in  each  if  the  data  were  distributed  according  to  the 
distribution  being  test,  in  this  case  the  normal  distribution.  The  Chi-Squared  Test 
Statistic  then  is 


1=1  A 


where  K  is  the  number  of  bins,  Nj  is  the  observed  number  of  entries  in  the  ith  bin,  and  Ei 
is  the  expected  number  of  entries  in  the  ith  bin,  given  a  normal  distribution.  Notice  that  as 
the  difference  between  the  observed  and  expected  values  increases,  the  Chi-square  test 
statistic  increases.  Thus  if  this  difference  is  sufficiently  large,  we  may  conclude  that  the 
true  distribution  of  the  data  is  not  normal. 

The  Anderson-Darling  (A-D)  test  uses  the  following  test  statistic  to  determine  if 
the  data  come  from  the  hypothesized  cumulative  distribution  function  (cdf) 

A2  =  -N  -S 


where 

N  =  sample  size 


x  2k  -] 

S  =  £-7— [in  )  +  ta(i  -  ))] 

k= 1  A 


Yk  =  k'h  ordered  data  point  of  Y 


F  =  hypothesized  cdf 

Again,  larger  values  of  this  statistic  indicate  violations  of  the  nonnality  assumption. 
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The  final  test  for  normality  is  the  Kolomogorov-Smirinov  (K-S)  Test,  which  uses 


a  more  simple  test  statistic 

D  =  max[|  Fn  (x)  -  F (x)  |] 

where 


„  ,  .  number  of  ordered  observations  below x 

Fn  0)  =  - 

n 

and 


F(x)  =  hypothesized  cdf 

The  K-S  test  measures  the  value  of  the  greatest  discrepancy  between  the  observed  and  the 
hypothesized,  in  this  case  the  normal,  cdf. 

All  three  tests  can  be  used  to  test  the  same  hypotheses: 

Ho:  The  data  are  drawn  from  a  normal  distribution 
Ha:  The  data  are  not  drawn  from  a  normal  distribution. 

The  results  for  the  raw  Population  Density  data  are  provided  in  Table  3-4. 


Table  3-4:  GOF  Test  Results  for  Non-Transformed  Data 


Chi-Square 

A-D 

K-S 

Test  Statistic 

231.43 

20.93 

0.25 

P-value 

0.0000 

<  0.005 

<0.01 

Here,  the  p-value  represents  the  probability  that  a  sample  of  the  same  size  drawn  from  the 
hypothesized  normal  distribution  could  generate  a  test  statistic  as  high  as  the  one 
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observed.  We  see  that  there  is  essentially  zero  likelihood  that  the  observed  Population 
Density  data  came  from  a  normal  distribution,  and  we  should  reject  the  null  hypothesis 
To  remedy  this,  we  attempt  to  transform  the  variable  in  such  a  way  that  the 
resulting  values  more  closely  follow  a  normal  distribution.  In  the  case  of  Population 
Density,  taking  the  natural  logarithm  of  the  values  produced  the  distribution  shown  in 
Figure  3-3. 


Figure  3-3:  Histogram  of  Population  Density  Data  After  Transformation 
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Visually,  it  appears  at  least  plausible  that  the  transformed  data  follow  a  normal 
distribution.  Running  the  aforementioned  tests  again  produced  the  following  results: 


Table  3-5:  GOF  Test  Results  for  Transformed  Data 


Chi-Square 

A-D 

K-S 

Test  Statistic 

14.28 

1.00 

0.07 

P-value 

0.4291 

<  0.025 

<0.01 

Each  of  the  test  statistics  were  dramatically  reduced  when  using  the  transformed 
data.  While  the  A-D  and  K-S  tests  still  do  not  support  such  a  claim,  the  Chi-square  test 
allows  a  43%  chance  that  the  data  observed  came  from  a  normal  distribution.  The  Chi- 
square  test  can  be  considered  valid  if  the  expected  bin  frequencies  are  sufficiently  large, 
typically  greater  than  five  (Banks  et  al.,  2005:  327).  Dividing  the  range  of  the  data  into 
15  bins  of  equal  probability,  each  bin  has  an  expected  frequency  of  12.93,  which  is 
sufficient.  For  our  purposes,  the  transformed  data  will  be  used  as  it  is  much  closer  to  the 
desired  distribution  than  the  raw  data.  Appendix  B  lists  the  transformation  used  for  each 
of  the  variables  in  the  reduced  dataset. 

3.2.4.  Missing  Data  Imputation 

As  discussed  in  Chapter  2,  there  are  several  methods  for  dealing  with  missing 
data.  For  this  thesis,  we  used  a  Nearest  Neighbor  Hot  Deck  Imputation  (NNI)  available 
in  the  statistical  package  XLSTAT.  This  procedure  replaces  missing  values  for  a  given 
record  with  values  observed  for  a  similar  record,  the  nearest  neighbor.  A  complete 
imputed  dataset  is  then  output  for  future  analysis. 
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To  find  the  nearest  neighbor  to  country  x,  the  distance  in  the  variable  space 
between  it  and  every  other  country  y  on  the  basis  of  the  60  variables  is  calculated  using 
the  Euclidean  Distance  formula 

W,  =  JL(-\-y,y  <  =  1  to  60 

For  cases  in  which  multiple  variables  are  missing  for  two  observations,  the  average  of  the 
distances  between  each  of  the  non-missing  variables  is  substituted  for  the  missing 
distance  values.  In  the  end,  each  country  is  assigned  a  rank  order  of  its  nearest  neighbors. 
If  a  state  is  missing  a  value  on  one  of  its  variables,  a  value  was  imputed  from  the  most 
similar  country  having  a  value  on  that  variable. 

With  the  data  imputation,  the  dataset  used  for  the  remainder  of  this  study  was 
workable.  Appendix  B  provides  a  list  of  the  variables  considered,  and  Appendix  C  lists 
the  countries  analyzed.  The  next  step  was  to  detennine  which  variables  in  the  dataset 
were  most  significant  for  characterizing  nations. 

3.3.  Variable  Selection 

This  section  provides  the  methodology  used  to  achieve  the  first  of  the  two 
primary  goals  for  this  thesis  -  identify  a  set  of  key  variables  which  can  be  used  to  classify 
weak  or  failing  states.  We  used  two  overarching  methods  for  variable  selection;  Factor 
Analysis,  and  the  model  construction  procedures  within  Discriminant  Analysis.  The 
variables  selected  using  each  method,  and  the  resulting  discriminant  functions  can  be 
found  in  Chapter  4. 
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3.3.1.  Factor  Analysis 


Using  FA  to  identify  key  variables  in  a  dataset  involves  two  primary  steps.  First, 
we  identify  the  principal  factors  characterizing  the  latent  structure  of  the  data.  A 
relatively  small  number  of  principal  factors  accounts  for  the  majority  of  variation  in  the 
entire  dataset.  Second,  we  examine  these  factors,  label  them,  and  select  those  variables 
which  load  most  heavily  on  the  key  factors.  For  this  study,  we  used  three  different 
techniques  for  selecting  variables  to  be  used  for  classifying  states  based  on  factor  scores. 

Technique  1  was  simply  to  discriminate  based  on  each  country’s  factor  scores 
across  all  factors.  This  is  equivalent  to  using  the  factor  scores  as  observable  variables. 
This  technique  has  the  benefit  of  including  as  much  information  as  possible  from  the 
original  dataset,  but  is  less  useful  in  the  sense  that  all  60  variables  are  required  to 
generate  these  scores.  Technique  2  was  to  choose  those  variables  most  heavily  loaded  on 
the  first  and  second  principal  factors.  These  factors  would  account  for  the  majority  of  the 
variation  described  by  the  entire  set  of  factors.  The  third  technique  used  was  to  choose  at 
least  one  variable  from  each  of  the  retained  factors. 

3.3.1. 1.  Mechanics  of  Factor  Analysis 

This  section  provides  the  mechanics  involved  in  FA.  It  is  compiled  from  Dillon 
and  Goldstein,  1984  and  Lattin,  Carroll  and  Green,  2003.  Further  details  on  FA  can  be 
found  in  either  of  these  texts. 

For  illustrative  purposes,  assume  we  have  a  dataset  X with  i  observable  variables. 
FA  assumes  that  the  variation  in  each  variable  is  attributable  to  two  sources,  that  which 
is  inherent  in  the  variable  denoted  8;  and  that  which  can  be  attributed  to  some  number  of 
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common  factors  E,.  If  we  assume  a  two  factor  model  is  hypothesized,  then  the  observed 
value  for  each  of  the  variables  could  be  written  as 

Xi  =  +  K  2  +  Si 

where  the  k’s  are  the  coefficients  which  reflect  the  variation  attributable  to  each  of  the 
common  factors  (Lattin  et  al,  2003:  133).  Thus,  if  we  hypothesize  that  a  country’s  Infant 
Mortality  Rate  (IMR)  is  really  a  function  of  two  immeasurable  factors  such  as  National 
Economy  and  Health  Services,  we  could  predict  a  given  country’s  IMR  if  we  knew  these 
factor  scores.  However,  in  FA  we  have  the  observed  data;  what  we  are  interested  in  is 
discovering  and  characterizing  the  latent  factors  which  resulted  in  the  data  observed. 

To  find  these  common  factors,  we  begin  with  the  correlation  matrix  R,  made  up  of 
all  correlations  between  each  pair  of  variables.  We  are  now  interested  in  detennining  if 
there  are  one  or  more  underlying  factors  such  that  these  correlations  fall  to  zero  if  the 
variation  attributable  to  such  factors  is  removed.  That  is,  does  the  following  equation 
hold  for  some  E,  (Dillon  et  al,  1984:  64-65)? 

p(Xi,Xj  |<?)  =  0  i*j 

If  so,  then  a  set  of  common  factors  E,  must  exist  such  that  the  equation  for  A)  above  is  true. 

If  we  now  assume  that  the  portions  of  the  variation  common  to  the  underlying 
factors  and  unique  to  the  variable  itself  are  uncorrelated,  and  we  standardize  the  E,  so  that 
their  mean  is  0  and  standard  deviation  is  1 ,  we  can  express  the  variance  of  any  Xt  as 

var(A)  =  var (£*&)  +  varfif ) . 
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The  first  term  on  the  right  is  the  variance  in  X,  attributable  to  the  common  factors.  This  is 
also  known  as  the  communality  of  Xh  and  is  often  denoted  h  These  communalities  are 
then  used  to  construct  our  reduced  correlation  matrix  (Dillon  et  al,  1984:  66-67). 

Since  FA  is  concerned  with  the  variation  in  observed  data  attributable  to  common 
factors,  we  need  to  include  some  measure  of  the  covariance  along  with  the  correlations. 
To  do  this,  we  replace  the  diagonal  elements  of  the  correlation  matrix  R  with  an  estimate 
of  each  variable’s  communality.  The  method  for  estimating  the  initial  communalities 
used  in  this  thesis  is  the  Squared  Multiple  Correlation  (SMC).  This  value  is  found  by 
regressing  each  variable  on  all  other  variables  in  the  dataset  one  at  a  time,  and  calculating 
the  resulting  R-square  value  (Lattin  et  al,  2003:  136-7).  Of  course,  we  would  like  to 
regress  each  variable  on  the  common  factors.  However,  we  do  not  have  the  common 
factors  at  this  point,  only  the  variables  reflecting  these  factors.  Instead,  we  use  the  SMC 
which  provides  a  lower  bound  for  the  true  communalities  (Lattin  et  al,  2003:  137).  Recall 
that  in  linear  regression,  R-square  represents  the  amount  of  variation  in  a  variable 
attributable  to  the  variation  in  the  others.  Thus  our  reduced  correlation  matrix  is 

P(  2,i)  Pap) 

P(  2,1)  ^2 


R*  = 


P(p,  i) 


R: 


The  first  common  factor  can  now  be  calculated  by  computing  the  eigenvalues  and 
eigenvectors  of  R*.  The  largest  eigenvalue  will  be  In  addition,  the  total  amount  of 
variance  in  the  original  dataset  captured  by  this  first  factor  can  be  found  by  multiplying 
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the  first  eigenvector  by  its  transpose  (Dillon  et  al,  1984:  74).  The  eigenvalues  and 
corresponding  eigenvectors  are  all  (k,  u)  solutions  to 

R*u  =  An . 

Finding  solutions  to  this  equation  requires  finding  roots  to  a  polynomial  of  order  p.  This 
can  be  accomplished  using  computer-run  algorithms  which  estimate  numerical  solutions. 

To  find  the  next  factors,  we  subtract  Ci  from  R*  and  compute  the  eigenvalues 
again.  This  process  continues  until  the  largest  eigenvalue  no  longer  accounts  for  a 
significant  amount  of  the  remaining  variation  in  the  data.  A  common  rule  of  thumb  is  to 
extract  only  those  factors  with  eigenvalues  greater  than  or  equal  to  one. 

At  this  point,  we  could  be  satisfied  with  a  set  of  previously  unmeasured  factors 
which  account  for  a  significant  amount  of  the  variation  in  a  dataset.  However,  we  can 
improve  on  the  estimation  of  these  factors  in  two  ways. 

First,  recall  that  the  communalities  used  to  construct  the  original  reduced 
correlation  matrix  were  estimated  through  regression.  But  what  we  are  interested  in  is 
the  true  communality  -  the  amount  of  variation  in  each  variable  attributable  to  the 
common  factors.  We  can  improve  on  our  initial  estimates  by  examining  the  correlations 
between  the  original  variables  and  the  common  factors  we  have  just  calculated  based  on 
our  initial  communality  estimates.  These  correlations  are  also  called  factor  loadings 
(Lattin  et  al,  2003: 136).  For  example,  consider  a  model  in  which  only  two  factors  are 
retained.  If  the  variable  Xj  is  found  to  have  loadings  of  0.77  and  -0.24  with  the  first  and 
second  principal  factors  respectively,  R~ i  would  be  replaced  by  (0.77)  +  (-0.24)  =  0.65 
in  the  new  reduced  correlation  matrix,  and  we  proceed  as  before.  This  iterative  process 
continues  until  there  is  very  little  change  in  communality  estimates  (Lattin  et  al. 
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2003: 136).  Typically,  software  packages  perform  this  process  automatically  so  the  there 
is  no  need  to  run  FA  multiple  times. 

A  second  technique  for  improving  our  factor  model  does  not  deal  as  much  with 
determining  the  underlying  structure  of  the  data,  but  rather  with  interpreting  the  structure. 
Factor  Rotation  involves  reorienting  the  principal  factors  in  such  a  way  that  the  original 
variables,  to  the  extent  possible,  become  more  heavily  loaded  on  one  factor  than  the 
others.  The  desired  results  of  such  a  rotation  are  described  in  Lattin  et  al,  2003  page  139: 

1 .  Most  of  the  loadings  on  any  specific  factor  (column)  should  be  small  (as  close 
to  zero  as  possible),  and  only  a  few  loadings  should  be  large  in  absolute  value. 

2.  A  specific  row  of  the  loadings  matrix,  containing  the  loadings  of  a  given 
variable  with  each  factor,  should  display  nonzero  loadings  on  only  one  or  no 
more  than  a  few  factors. 

3.  Any  pair  of  factors  (columns)  should  exhibit  different  patterns  of  loadings. 
Otherwise,  one  could  not  distinguish  the  two  factors  represented  by  these 
columns. 

A  common  rotation  technique,  and  the  one  used  in  this  thesis,  is  Kaiser’s  Varimax 
Rotation.  We  define  the  elements  of  the  loadings  matrix  A  derived  from  the  FA 
procedure  as  r^,  equal  to  the  correlation  of  variable  i  with  common  factor  k.  Then  the 
proportion  of  variation  in  variable  i  which  is  attributable  to  k  is  (r  ,k) ",  also  known  as  the 
communality  of  i,  and  the  total  communality  of  the  factor  model  is  the  sum  of  these 
individual  communalities.  Our  goal  in  rotating  the  factors  is  to  find  a  rotation  matrix  T 
which  yields  a  new  loadings  matrix  A  such  that  A  =  AT  and,  to  the  greatest  extent 
possible,  the  elements  of  A,  a  &  ,  are  such  that  (a  ,-*)  is  close  to  1  (or  -1)  or  zero  (Lattin  et 
al,  2003:145). 

To  accomplish  this,  we  maximize 
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where 


c  p 


r  =  II 

k=  1  i= 1 


-1!  14 

P  k= i  V  «=i 


y 

y 


c  =  number  of  retained  factors 
p  =  number  of  variables  in  the  dataset 
atk  =  correlation  of  the  ith  variable  with  the  kth  rotated  factor 
(Lattin  et  al,  2003: 145). 

The  result  of  the  Varimax  Rotation  may  be  better  understood  visually.  In  figure 
3-4,  the  graph  on  the  left  shows  a  fictitious  relationship  between  several  original  variables 
and  two  common  factors  which  have  not  been  rotated.  The  graph  on  the  right  shows  their 
relationship  after  factor  rotation.  As  you  can  see,  although  the  structure  of  the  data  has 
not  changed  it  should  now  be  easier  to  characterize  each  of  the  factors  because  they  are 
more  closely  related  to  quantifiable  variables  we  have  collected. 


First  Principal  Factor 


“Economy” 


Un  rotated 


Rotated 


Figure  3-4:  Conceptual  Plot  of  Rotated  and  Unrotated  Factor  Loadings 

(Lattin  et  al,  2003:  140) 
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Following  FA  and  Rotation,  we  obtain  a  loadings  matrix  which  provides  the  latent 
structure  of  our  original  dataset.  As  suggested  by  Figure  3-4,  we  can  group  variables  by 
the  principal  factor  on  which  they  are  most  heavily  loaded,  and  label  the  factors  so  that 
we  may  better  understand  the  nature  of  the  data.  If  we  could  measure  each  of  these 
factors  directly,  we  would  be  able  to  account  for  a  significant  proportion  of  the 
information  available  in  the  original,  larger  dataset  with  only  a  few  variables.  Recall, 
however,  that  the  objective  of  FA  is  only  to  uncover  the  immeasurable  factors  reflected 
in  the  observable  data.  For  practical  purposes  then,  the  next  step  is  to  select  the  variables 
we  can  measure  which  load  on  the  principal  factors  in  such  a  way  that  what  they  most 
closely  approximate  the  structure  of  the  data.  We  used  two  approaches  to  select  variables 
based  on  factor  loadings  for  this  study.  The  first  was  to  select  at  least  one  variable  which 
loaded  heavily  on  each  of  the  retained  factors,  using  a  minimum  loading  of  0.5.  This 
threshold  indicates  that  at  least  half  of  the  variance  in  a  given  variable  can  be  attributed  to 
the  common  factor.  The  second  method  was  to  choose  all  variables  which  loaded  with  a 
value  of  at  least  0.5  on  the  most  significant  factors.  Therefore,  we  chose  variables  in 
order  of  their  loadings  on  the  first  principal  factor  until  all  scoring  at  least  0.5  were  in  the 
model.  We  then  moved  on  to  the  next  factor,  and  so  on.  As  shown  in  Chapter  4,  the  two 
methods  resulted  in  surprisingly  similar  sets  of  selected  variables. 

3.3.2.  Discriminant  Analysis 

Discriminant  Analysis  (DA)  finds  the  linear  combination  of  the  independent 
variables,  or  a  subset  of  these  variables,  which  produces  the  greatest  difference  between 
two  or  more  predefined  groups  (Lattin  et  al,  2003,  429).  As  we  have  seen  from  our 
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discussion  of  Factor  Analysis,  it  may  not  be  necessary  to  use  the  entire  set  of  independent 
variables,  particularly  if  the  true  differences  among  the  groups  lie  in  a  smaller  number  of 
underlying  dimensions.  The  following  sections  describe  two  ways  we  can  choose  the 
variables  most  useful  in  building  the  discriminant  functions  -  Comparing  Mean 
Differences  in  the  Full  Model  and  Stepwise  Selection. 


3.3.2. 1,  Full  Model  Variable  Selection  in  Discriminant  Analysis 

We  begin  by  considering  the  two-group  problem  for  illustrative  purposes.  The 
results  can  be  readily  extended  to  the  three-group  situation  used  in  this  thesis,  as  will  be 
described  later.  For  a  given  set  of  variables  X,  we  desire  to  find  the  vector  of  coefficients 
k  to  create  the  greatest  difference  in  the  discriminant  function  scores  t  =  Xk  between 
members  of  the  two  groups  (Lattin  et  al,  2003:  436-7).  To  measure  this  difference,  we 
compare  the  sum  of  squares  within  each  group  to  the  sum  of  squares  across  all  groups. 
We  hope  to  simultaneously  find  the  largest  across-group  variance  and  the  smallest 
within-group  variance  which  can  be  done  by  maximizing  the  ratio 

L_ssA 

ssw' 


We  calculate  the  sum  of  squares  across  groups  by 

SSA  =k\nx(xm  -x)(x(1)  -x)'+  n,(x(2)  -x)(x(2)  -x)']k 

and  the  sum  of  squares  within  groups  by 

SSw  =  YJk'(*i(  1)  “*(1)  X*<(1)  -*(!))'* +  Z*'(*<(2)  -*(2))(X-(2)  “*(2 ))'k 


where  x,  xl  ,and  x2  are  the  vectors  of  variable  means  for  the  entire  sample,  Group  1  and 
Group  2  respectively,  and  n/  and  m  are  the  size  of  the  two  groups.  Rewriting  our  original 
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objective  function,  we  wish  to  choose  k  to  maximize 


k' K(x(i)  -*)0(i)  - x)'+  n2(x(2) -x)(x(2)-x)']k 


2  V*A/(2)  7V*/V(2) 


2>'0;(I)  -x(1))0m)  -x(l))'k  +  Y,k'(xm  -x(2))(xm-x(2))'k 


Differentiating  with  respect  to  k,  setting  the  result  equal  to  zero,  and  simplifying  we  see 


that  we  can  choose  k  as  follows. 


k  oc  |  x(1)  x(2) 


where  Cw  is  the  pooled  within-group  covariance  matrix  (Lattin  et  al,  2003:  436-7). 

Clearly,  the  variables  for  which  there  is  the  greatest  difference  in  means  between 
groups  contribute  most  significantly  to  this  quantity.  We  may  choose  to  use  all  p 
variables  from  our  dataset,  in  which  case  k  will  be  a  1  x p  vector  of  coefficients.  In  that 
case,  all  variable  differences  between  groups,  regardless  of  degree,  would  be  used  for 
discrimination.  However,  if  our  goal  is  to  reduce  the  number  of  variables  required  for 
classification,  we  extract  only  those  variables  for  which  the  differences  among  groups  are 
substantial.  We  do  this  through  F-tests  on  the  differences  between  means  across  groups. 

Our  first  step  is  to  test  if  there  is  any  significant  difference  between  the  groups  as 
a  whole.  If  there  is  no  significant  difference,  no  variable  or  group  of  variables  will  be 
sufficient  to  discriminate.  We  perform  an  F-test  with  the  Hotelling’s  T2  test  statistic  for 
this  purpose  where 


T  0(2)  -fil) )  Cw  0(2)  -fil)) 

,7\  +  n2 


{nl+n2-  p-\)  2 


p(n2  +  n2  -  2) 


T  ~  F(p,nl+n2-  p-l). 
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(Lattin  et  al,  2003:  446).  The  null  hypothesis  is  that  there  is  no  difference  among  groups. 
We  reject  the  null  in  favor  of  the  alternative  that  there  is  a  significant  difference  if  the  T2 
value  exceeds  the  critical  F  value. 

For  the  three  group  problem  we  use  Rao’s  F-Test  for  Wilks’  A.  Wilks’  A  is 
defined  as 


A  = 


where  |Se|  is  the  determinant  of  the  residual  error  sum  of  squares  matrix  after  accounting 
for  the  variance  explained  by  the  independent  variables,  and  |Sj|  is  the  determinant  of  the 
total  sum  of  squares  matrix  (Lattin  et  al,  2003:  333).  We  calculate  Rao’s  test  statistic 


Ra  = 


1  -  A 


Vs 


A 


Vs 


l-ts 


pq 


where 


t  =  (n  - 1) 


(p  +  q  + 1) 
2 


(p2q2  -4) 
\(p2  +q2  -5) 


n  =  the  number  of  observations 
p  =  the  number  of  X  variables 
q  =  the  number  of  groups  -1 

Ra  follows  an  F-distribution  with  pq  and  1  +  ts  -  Vi  pq  degrees  of  freedom  (Lattin  et  al, 
2003:  335). 
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Once  we  have  detennined  that  there  is  a  difference  among  groups,  we  can  test  for 
differences  with  respect  to  individual  variables.  Our  test  statistic  is  computed  in  a  similar 
fashion,  but  we  use  only  one  variable  at  a  time  to  calculate  the  differences.  This  has  the 
effect  of  determining  if  any  individual  variables  are  useful  to  discriminate  among  groups 
given  the  other  variables  in  the  model  (Lattin  et  al,  2003:  446). 

These  tests  of  the  significance  of  individual  variables  provide  the  basis  for  our 
Full  Model  variable  selection.  Using  the  computed  F  statistics,  we  ranked  the  variables 
in  descending  order.  We  then  attempted  to  classify  countries  using  progressively  more 
variables  until  the  resulting  accuracy  no  longer  showed  substantial  improvement,  and 
built  discriminant  functions  based  on  the  resulting  set  of  variables. 

3.3.2.2.  Stepwise  Variable  Selection  in  Discriminant  Analysis 

The  mechanics  involved  in  using  the  Stepwise  approach  to  building  the 
discriminant  model  are  identical  to  those  described  above,  except  that  in  this  case,  rather 
than  using  all  available  variables  in  the  initial  model,  we  apply  an  iterative  approach  to 
construct  the  model  one  variable  at  a  time.  To  do  this,  the  individual  F-tests  are 
completed  on  each  variable  and  the  most  significant  is  added  to  the  model.  The  second 
most  significant  variable  is  then  added,  and  so  on.  At  each  step,  the  significance  of  each 
variable  in  the  new  model  is  tested  as  described  above  for  the  full  model.  Any  variable 
which  no  longer  significantly  contributes  to  the  model  is  removed. 

The  method  just  described  is  called  forward  stepwise  selection  because  we  start 
with  an  empty  model  and  add  variables  until  no  improvement  is  realized.  Conversely, 
backward  selection  involves  starting  with  all  variables  in  the  initial  model,  and  removes 
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variables  one  at  a  time  until  the  resulting  model  no  longer  provides  sufficient  accuracy, 
retesting  for  significance  of  individual  variables  remaining  in  the  model  at  each  step.  We 
employed  both  of  these  methods,  along  with  the  full  model  selection,  for  comparison’s 
sake.  As  expected,  these  three  methods  provided  similar  lists  of  key  variables. 

3.3.2.3.  Three  Group  Discriminant  Analysis 

The  extension  of  the  two-group  discriminant  problem  to  three  groups  is  fairly 
straight  forward.  The  solution  is  to  add  a  second  discriminant  function  perpendicular  to 
the  first  which  can  further  discriminate  among  groups.  This  can  be  seen  more  easily  with 
sample  plots. 


Figure  3-5:  Multiple  Discriminant  Functions  for  the  Three  Group  Problem 

(Lattin  et  al,  2003:  457-8) 
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Here  we  see  that  the  first  discriminant  function  provides  separation  between  Group  1  and 
the  other  two,  while  the  second  is  needed  to  distinguish  Group  2  from  Group  3. 

Therefore,  our  DA  resulted  in  two  mutually  orthogonal  discriminant  functions. 

3.3.3.  Summary  of  Variable  Selection  Procedures 

The  methods  described  in  the  preceding  sections  were  used  to  achieve  the  first 
primary  goal  of  this  effort  -  to  identify  a  relatively  small  subset  of  variables  that  can  be 
used  to  successfully  discriminate  between  stable  and  unstable  states.  The  next  section 
builds  on  these  results  by  developing  discriminant  functions  for  classifying  states  based 
on  these  variables. 

3.4.  Classifying  States 

Recall  that  in  order  to  use  DA  to  classify  observations  into  groups,  we  must  begin 
with  a  hypothesized,  a  priori  classification.  For  the  initial  Discriminant  Analysis  in  this 
study,  Thomas  Barnett’s  identification  of  Core,  Rim,  and  Gap  states  served  as  a  proxy  for 
ground  truth  as  no  official  governmental  classification  of  failing  states  was  available  at 
the  open  source  level.  We  first  tested  this  classification  to  see  if  there  truly  do  appear  to 
be  differences  among  groups.  Next,  we  perfonned  several  Discriminant  Analyses  using 
the  attributes  chosen  through  the  variable  selection  techniques  described  earlier. 
Following  an  extensive  look  at  the  Barnett  classifications,  we  compare  results  with  the 
Fund  for  Peace  2006  Failed  State  Index  using  the  same  variables.  The  variables  used  and 
the  results  of  the  DA  can  be  found  in  Chapter  4. 
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3.5.  Chapter  Summary 

This  chapter  outlines  the  various  methods  we  employed  to  achieve  the  two 
primary  objectives  of  this  study.  First,  it  describes  the  techniques  we  used  to  select  the 
variables  most  important  for  the  purpose  of  assessing  state  stability.  Second,  it  provides 
the  mechanics  of  Discriminant  Analysis  which  we  used  to  classify  states  in  terms  of  their 
overall  stability.  Chapter  4  provides  the  results  of  these  analyses. 
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4.  Analysis  Results 


4.1.  Introduction 

This  chapter  contains  the  results  of  the  variable  selection  and  Discriminant 
Analysis  described  in  Chapter  3.  Significant  conclusions  and  recommendations  for  future 
study  are  outlined  in  Chapter  5. 

4.2.  Variable  Selection 

We  used  several  methods  to  identify  the  key  variables  most  useful  in  classifying 
states  as  stable,  borderline,  or  unstable.  The  results  of  each  method  are  presented  here. 

4.2.1.  Exploratory  Factor  Analysis 

As  described  in  the  Methodology  section  of  this  paper,  Factor  Analysis  (FA)  was 
used  to  reduce  the  dimensionality  of  our  final  dataset  by  uncovering  its  underlying 
structure.  A  key  product  of  FA  is  a  matrix  of  factor  scores  as  shown  in  Table  4-1. 
Columns  represent  the  common  factors,  and  rows  represent  each  of  the  variables  in  our 
dataset. 
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Table  4-1:  Factor  Loadings  Matrix  Before  Rotation 


FI 

F2 

F3 

F4 

F5 

F6 

F7 

F8 

F9 

F10 

Log(AIOO) 

-0.292 

0.802 

-0.259 

-0.126 

-0.131 

0.208 

0.016 

-0.118 

-0.011 

0.019 

A110 

0.377 

-0.225 

0.055 

0.005 

0.061 

0.140 

0.138 

-0.167 

0.079 

-0.069 

A113 

0.230 

0.032 

-0.213 

-0.062 

0.021 

0.301 

0.143 

0.110 

0.033 

-0.173 

A114 

0.336 

-0.140 

0.024 

-0.221 

-0.298 

0.152 

-0.175 

-0.033 

-0.065 

0.081 

A116 

-0.437 

0.218 

-0.295 

-0.033 

-0.288 

0.133 

0.124 

0.147 

0.273 

-0.043 

Log(A1 18) 

-0.172 

0.541 

-0.117 

0.073 

-0.068 

-0.084 

0.084 

0.072 

0.491 

0.037 

Log(A1 19) 

0.902 

0.085 

0.046 

0.203 

0.058 

0.038 

0.017 

0.079 

0.032 

0.043 

A120 

-0.608 

0.148 

-0.320 

-0.069 

-0.190 

0.043 

0.021 

0.149 

0.071 

-0.097 

A122 

-0.549 

0.017 

-0.439 

0.153 

-0.058 

-0.191 

0.080 

-0.071 

-0.078 

0.042 

A124 

-0.146 

0.136 

0.134 

-0.235 

-0.102 

0.063 

0.013 

-0.160 

0.089 

-0.186 

Log(A126) 

-0.046 

-0.512 

0.327 

-0.202 

-0.031 

-0.082 

0.029 

0.008 

0.148 

0.117 

Log(A130) 

-0.307 

0.178 

-0.155 

-0.200 

0.096 

-0.185 

-0.176 

0.023 

-0.048 

-0.303 

Log(A133) 

0.446 

-0.157 

-0.451 

0.111 

0.266 

-0.073 

0.163 

0.009 

0.005 

0.007 

A135 

0.321 

-0.055 

-0.074 

0.289 

0.093 

-0.443 

0.082 

-0.158 

-0.084 

-0.141 

Log(A1 36+0.001) 

0.004 

-0.091 

-0.187 

0.061 

-0.229 

-0.135 

-0.031 

0.132 

-0.232 

0.013 

A141 

-0.390 

-0.518 

0.021 

-0.040 

0.073 

0.004 

0.024 

0.060 

0.191 

0.107 

Log(A144) 

0.044 

0.203 

-0.200 

-0.005 

-0.156 

-0.557 

-0.082 

0.185 

0.000 

-0.068 

Log(A152) 

-0.464 

0.147 

-0.445 

-0.049 

0.267 

-0.108 

-0.214 

-0.057 

0.069 

-0.231 

Log(A153) 

-0.052 

-0.304 

0.096 

-0.059 

0.070 

0.230 

-0.316 

0.229 

0.073 

0.199 

Log(A155) 

-0.346 

0.705 

-0.297 

-0.125 

0.281 

0.119 

-0.077 

0.007 

0.016 

0.283 

Log(A1 59+0.001) 

0.066 

-0.139 

-0.118 

0.213 

-0.188 

-0.139 

0.379 

-0.106 

0.113 

0.404 

Log(A1 60+0.001) 

0.204 

-0.067 

-0.288 

0.072 

-0.195 

-0.090 

0.204 

-0.034 

0.046 

0.144 

Log(A166) 

0.206 

-0.133 

0.146 

0.075 

-0.638 

0.106 

0.154 

-0.105 

-0.015 

-0.415 

Log(A167+0.001) 

-0.532 

-0.187 

-0.180 

0.395 

0.194 

0.226 

0.059 

0.099 

-0.173 

0.001 

Log(A1 72+0.001) 

0.012 

0.433 

0.409 

-0.311 

0.048 

-0.193 

0.302 

0.259 

-0.238 

-0.012 

Log(A174) 

-0.871 

-0.018 

0.142 

-0.014 

0.103 

0.023 

0.101 

-0.067 

-0.103 

0.064 

A175 

0.630 

-0.249 

-0.195 

0.044 

0.002 

-0.024 

-0.144 

-0.055 

-0.129 

-0.040 

A177 

0.575 

-0.014 

-0.088 

-0.245 

0.136 

-0.071 

-0.063 

-0.199 

-0.109 

-0.045 

Log(A180) 

-0.270 

-0.137 

-0.024 

0.388 

0.050 

0.037 

-0.027 

0.179 

-0.068 

-0.025 

Log(A181) 

0.205 

-0.257 

0.105 

-0.075 

0.004 

-0.040 

-0.040 

0.013 

-0.044 

-0.082 

Log(A182) 

0.389 

0.422 

0.135 

0.038 

0.195 

0.219 

0.066 

0.173 

0.111 

-0.157 

Log(A184) 

0.208 

-0.317 

-0.193 

-0.412 

-0.020 

-0.110 

-0.058 

0.173 

0.138 

0.063 

A185 

0.688 

0.186 

-0.085 

0.174 

0.080 

0.021 

-0.066 

0.205 

-0.108 

0.083 

A186 

0.167 

0.247 

-0.063 

-0.059 

0.309 

0.233 

-0.261 

0.262 

-0.110 

-0.036 

A190 

-0.886 

0.021 

0.186 

0.170 

0.095 

-0.095 

0.018 

0.087 

0.072 

-0.064 

A192 

0.604 

0.086 

-0.148 

-0.053 

-0.183 

-0.121 

-0.195 

-0.105 

-0.081 

0.073 

Log(A193) 

-0.819 

-0.149 

0.047 

-0.091 

0.006 

-0.119 

0.028 

0.020 

-0.040 

-0.021 

Log(A209) 

0.005 

0.415 

0.088 

0.046 

0.120 

-0.101 

0.105 

-0.205 

-0.059 

0.134 

A21 1 

-0.007 

-0.190 

-0.058 

-0.209 

-0.319 

0.081 

-0.103 

0.088 

-0.201 

0.083 

Log(A215) 

-0.832 

0.025 

0.002 

-0.051 

0.084 

-0.175 

0.079 

-0.082 

-0.072 

-0.102 

Log(A216) 

-0.459 

0.145 

0.074 

0.256 

-0.001 

-0.282 

-0.342 

-0.096 

0.150 

0.021 

Log(A221 +0.001) 

-0.023 

0.657 

0.332 

-0.286 

-0.029 

-0.109 

0.317 

0.254 

-0.251 

0.034 

A225 

0.332 

0.205 

0.336 

-0.232 

0.118 

-0.304 

-0.056 

0.063 

0.151 

0.037 

Log(A231 +0.001) 

0.877 

0.088 

-0.264 

0.054 

0.001 

-0.103 

-0.048 

-0.053 

0.033 

0.033 

A236 

-0.124 

0.086 

-0.131 

0.016 

-0.164 

0.130 

0.109 

0.098 

-0.016 

0.010 

A239 

-0.236 

0.087 

-0.234 

-0.133 

0.023 

-0.059 

0.105 

0.081 

-0.114 

0.093 

Log(-A243) 

0.532 

0.671 

-0.126 

0.157 

-0.191 

0.159 

0.050 

-0.068 

-0.018 

-0.036 

A244 

-0.008 

-0.049 

-0.053 

0.054 

-0.005 

-0.100 

-0.075 

-0.036 

-0.128 

-0.019 

Log(A246+0.001) 

0.541 

-0.177 

-0.146 

0.032 

0.054 

0.035 

0.051 

0.127 

0.172 

-0.138 

A247 

-0.151 

0.172 

-0.170 

0.081 

-0.063 

-0.240 

-0.087 

0.320 

0.043 

-0.029 

Log(A248) 

-0.147 

0.495 

0.016 

0.020 

-0.276 

0.042 

-0.256 

-0.158 

-0.121 

0.141 

A250 

0.120 

0.217 

-0.025 

0.560 

0.037 

0.113 

-0.016 

0.025 

-0.173 

-0.078 

A252 

-0.281 

0.470 

-0.022 

-0.288 

0.089 

0.138 

-0.055 

-0.351 

0.098 

-0.027 

A253 

-0.707 

-0.312 

0.042 

0.013 

0.254 

0.170 

0.111 

-0.079 

-0.062 

-0.024 

A257 

-0.603 

-0.272 

-0.349 

-0.325 

0.125 

0.010 

0.040 

-0.182 

-0.165 

-0.034 

A259 

0.552 

-0.140 

-0.368 

-0.366 

-0.020 

-0.017 

-0.125 

-0.018 

-0.070 

0.102 

Log(A262) 

0.486 

0.276 

0.484 

0.101 

0.158 

-0.081 

-0.204 

-0.086 

0.085 

-0.042 

Log(A263) 

0.449 

-0.255 

-0.265 

-0.306 

0.180 

0.034 

0.255 

0.104 

0.008 

-0.096 

Log(A264) 

0.701 

0.043 

-0.138 

-0.077 

0.329 

-0.086 

0.225 

-0.047 

0.099 

-0.117 

Log(A266+0.001) 

0.398 

0.135 

0.024 

0.067 

0.214 

-0.007 

0.204 

-0.223 

-0.068 

0.068 
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Shown  are  the  factor  scores  for  each  variable  on  the  first  ten  principal  factors. 
Loadings  higher  than  0.5  are  highlighted  in  bold  type.  Again,  in  the  interest  of  space,  3- 
digit  codes  have  been  used  in  place  of  series  descriptions.  Complete  variable  names  are 
provided  in  Appendix  B.  Where  log(AXXX)  is  shown,  this  indicates  the  variable  was 
transformed  using  the  natural  logarithm.  Log(AXXX  +  0.001)  indicates  that  an  epsilon 
value  was  added  to  allow  us  to  take  the  natural  log  of  variables  which  contained  zeroes. 

Note  that  with p  independent  variables,  as  many  as  p  principal  factors  are 
possible.  Our  decision  to  retain  only  the  first  ten  factors  was  aided  by  the  eigenvalues 
corresponding  to  each  factor,  and  a  Scree  Plot  showing  the  amount  of  variation  in  the 
dataset  explained  by  additional  factors. 
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Table  4-2:  Eigenvalues  and  Variance  Accounted  for  Before  Rotation 


Eigenvalue 

Variability  (%) 

Cumulative  % 

FI 

12.321 

20.535 

20.535 

F2 

5.274 

8.790 

29.326 

F3 

2.798 

4.663 

33.989 

F4 

2.231 

3.719 

37.707 

F5 

1.899 

3.165 

40.873 

F6 

1.619 

2.698 

43.570 

F7 

1.376 

2.293 

45.863 

F8 

1.183 

1.971 

47.834 

F9 

1.027 

1.712 

49.547 

F10 

0.964 

1.607 

51.153 

F1 1 

0.785 

1.308 

52.462 

F12 

0.688 

1.147 

53.609 

F13 

0.591 

0.986 

54.594 

F14 

0.539 

0.899 

55.493 

F15 

0.492 

0.820 

56.313 

F16 

0.447 

0.744 

57.058 

F17 

0.396 

0.660 

57.718 

F18 

0.364 

0.607 

58.325 

F19 

0.362 

0.603 

58.928 

F20 

0.299 

0.498 

59.427 

F21 

0.253 

0.422 

59.848 

F22 

0.234 

0.390 

60.238 

F23 

0.211 

0.352 

60.591 

F24 

0.156 

0.261 

60.851 

F25 

0.153 

0.254 

61.106 

F26 

0.114 

0.189 

61.295 

F27 

0.094 

0.157 

61.453 

F28 

0.064 

0.107 

61.560 

F29 

0.029 

0.049 

61.609 

F30 

0.007 

0.012 

61.621 

Scree  plot 
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Figure  4-1:  Plot  of  Eigenvalues  and  Explained  Variance 
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After  the  first  two  principal  factors,  there  appears  to  be  relatively  little  increase  in 
the  amount  of  variability  explained  by  adding  additional  factors  to  the  model,  and  almost 
no  improvement  after  the  first  ten  factors  are  included.  The  first  two  factors  account  for 
approximately  30%  of  the  total  variation  in  the  data,  whereas  Factors  1 1  through  30 
account  for  only  9%.  This  suggests  that  we  capture  a  large  portion  of  the  total  variation 
available  in  the  dataset  with  the  first  few  factors,  which  is  typical  in  FA. 

To  aid  in  factor  characterization  and  variable  selection,  we  next  performed  a 
Varimax  rotation.  We  retained  those  factors  with  eigenvalues  greater  than  or  equal  to 
1.0,  meaning  the  first  ten  factors  were  retained,  rounding  the  tenth  value  of  0.964  to  1.0. 
The  resulting  eigenvalues  and  factor  scores  are  shown  here. 


Table  4-3:  Variance  Accounted  for  After  Varimax  Rotation 


Variability 

(%) 

Cumulative 

% 

D1 

19.575 

19.575 

D2 

7.605 

27.180 

D3 

4.335 

31.515 

D4 

3.215 

34.730 

D5 

2.704 

37.435 

D6 

3.110 

40.545 

D7 

3.711 

44.256 

D8 

2.514 

46.770 

D9 

2.151 

48.921 

D10 

2.232 

51.153 

Note  that  the  amount  of  variation  explained  by  each  factor  may  have  changed,  but 


the  overall  variation  explained  by  the  first  ten  rotated  factors  remains  the  same  at  51%. 


We  have  not  changed  the  structure  of  the  data,  only  its  orientation  so  that  we  may  better 


understand  it. 
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Table  4-4:  Factor  Loadings  After  Varimax  Rotation 


D1 

D2 

D3 

D4 

D5 

D6 

D7 

D8 

D9 

D10 

Log(AIOO) 

-0.118 

0.914 

0.173 

0.018 

-0.042 

0.024 

-0.026 

-0.023 

0.027 

0.030 

A110 

0.302 

-0.218 

-0.112 

-0.054 

-0.104 

-0.275 

0.133 

0.134 

0.016 

-0.032 

All  3 

0.236 

0.110 

-0.042 

0.101 

-0.112 

-0.124 

0.355 

-0.077 

0.077 

0.111 

A114 

0.317 

-0.076 

-0.062 

-0.226 

-0.196 

-0.085 

-0.016 

-0.274 

-0.233 

-0.002 

A116 

-0.337 

0.423 

-0.085 

0.002 

-0.189 

0.168 

0.162 

-0.231 

0.291 

-0.115 

Log(A1 18) 

-0.040 

0.496 

0.069 

-0.066 

0.037 

0.165 

-0.108 

0.018 

0.546 

-0.111 

Log(A1 19) 

0.899 

-0.159 

0.063 

0.137 

0.022 

-0.096 

-0.026 

0.057 

0.081 

-0.030 

A120 

-0.522 

0.372 

-0.084 

0.040 

-0.104 

0.267 

0.164 

-0.186 

0.102 

0.010 

A122 

-0.482 

0.257 

-0.225 

0.148 

0.049 

0.356 

0.119 

0.113 

-0.079 

-0.209 

A124 

-0.158 

0.159 

0.098 

-0.233 

-0.207 

-0.113 

-0.032 

0.057 

0.044 

0.141 

Log(A126) 

-0.174 

-0.524 

-0.015 

-0.318 

-0.011 

-0.152 

-0.025 

-0.118 

0.024 

-0.104 

Log(A130) 

-0.273 

0.207 

0.031 

-0.112 

0.027 

0.292 

0.058 

0.116 

-0.027 

0.353 

Log(A133) 

0.435 

-0.096 

-0.230 

0.124 

0.206 

0.107 

0.406 

0.217 

-0.007 

-0.099 

A135 

0.288 

-0.214 

-0.045 

0.117 

-0.007 

0.274 

-0.064 

0.476 

-0.031 

-0.085 

Log(A1 36+0.001) 

0.018 

-0.049 

-0.024 

0.115 

-0.111 

0.301 

0.057 

-0.114 

-0.213 

-0.096 

A141 

-0.461 

-0.350 

-0.257 

-0.111 

0.106 

-0.055 

0.110 

-0.142 

0.097 

-0.109 

Log(A144) 

0.100 

0.064 

0.130 

-0.085 

0.008 

0.641 

-0.061 

0.090 

0.049 

-0.028 

Log(A152) 

-0.382 

0.349 

-0.279 

-0.005 

0.216 

0.297 

0.148 

0.172 

0.042 

0.306 

Log(A153) 

-0.060 

-0.234 

-0.181 

-0.036 

0.207 

-0.136 

-0.065 

-0.436 

-0.016 

0.119 

Log(A155) 

-0.172 

0.789 

0.135 

0.033 

0.472 

0.026 

-0.001 

-0.040 

0.048 

0.028 

Log(A1 59+0.001) 

0.057 

-0.072 

-0.077 

0.034 

-0.003 

0.021 

0.046 

0.058 

0.063 

-0.666 

Log(A1 60+0.001) 

0.217 

0.026 

-0.120 

-0.002 

-0.086 

0.145 

0.177 

0.020 

-0.011 

-0.352 

Log(A166) 

0.151 

-0.128 

-0.005 

0.008 

-0.819 

-0.016 

-0.021 

-0.021 

-0.005 

-0.027 

Log(A1 67+0.001) 

-0.509 

-0.001 

-0.251 

0.530 

0.134 

-0.048 

0.108 

-0.042 

-0.034 

-0.021 

Log(A1 72+0.001) 

-0.020 

0.067 

0.832 

-0.063 

0.028 

0.021 

-0.004 

0.040 

0.010 

0.048 

Log(A174) 

-0.878 

0.106 

0.076 

0.074 

0.100 

-0.082 

-0.073 

0.018 

-0.041 

-0.047 

A175 

0.596 

-0.244 

-0.228 

0.001 

-0.022 

0.054 

0.116 

0.047 

-0.233 

0.048 

A177 

0.532 

-0.060 

0.006 

-0.244 

0.062 

-0.056 

0.132 

0.202 

-0.234 

0.118 

Log(A180) 

-0.251 

-0.122 

-0.138 

0.406 

0.042 

0.086 

-0.065 

-0.067 

0.069 

-0.010 

Log(A181) 

0.134 

-0.302 

-0.019 

-0.089 

-0.061 

-0.033 

0.040 

0.000 

-0.082 

0.080 

Log(A182) 

0.430 

0.190 

0.277 

0.154 

0.040 

-0.197 

0.020 

0.024 

0.288 

0.246 

Log(A184) 

0.157 

-0.230 

-0.090 

-0.396 

0.090 

0.154 

0.318 

-0.209 

-0.028 

0.016 

A185 

0.731 

-0.014 

0.100 

0.222 

0.132 

0.057 

0.005 

-0.048 

-0.018 

0.026 

A186 

0.230 

0.170 

0.087 

0.162 

0.297 

-0.059 

0.052 

-0.181 

-0.020 

0.392 

A190 

-0.869 

0.042 

0.045 

0.168 

0.076 

0.084 

-0.196 

0.018 

0.213 

0.042 

A192 

0.632 

0.038 

-0.061 

-0.150 

-0.064 

0.149 

-0.078 

0.007 

-0.213 

-0.034 

Log(A193) 

-0.838 

-0.009 

-0.001 

-0.042 

0.034 

0.120 

-0.007 

-0.033 

-0.023 

-0.004 

Log(A209) 

0.050 

0.299 

0.231 

0.017 

0.137 

-0.066 

-0.180 

0.261 

-0.008 

-0.111 

A21 1 

-0.022 

-0.067 

-0.018 

-0.115 

-0.175 

0.070 

0.070 

-0.326 

-0.299 

-0.047 

Log(A215) 

-0.831 

0.138 

0.047 

0.002 

0.045 

0.146 

-0.014 

0.156 

-0.014 

0.031 

Log(A216) 

-0.380 

0.121 

-0.162 

0.019 

0.118 

0.264 

-0.484 

0.089 

0.132 

0.056 

Log(A221 +0.001) 

0.001 

0.316 

0.874 

-0.002 

0.001 

0.024 

-0.027 

-0.007 

0.019 

0.006 

A225 

0.305 

-0.115 

0.369 

-0.326 

0.148 

0.045 

-0.198 

0.102 

0.159 

0.098 

Log(A231  +0.001) 

0.899 

-0.009 

-0.098 

-0.047 

0.030 

0.112 

0.098 

0.135 

-0.044 

-0.055 

A236 

-0.086 

0.172 

0.015 

0.103 

-0.119 

0.033 

0.114 

-0.145 

0.026 

-0.090 

A239 

-0.206 

0.186 

0.068 

-0.001 

0.118 

0.158 

0.201 

-0.040 

-0.096 

-0.088 

Log(-A243) 

0.657 

0.542 

0.180 

0.177 

-0.185 

-0.021 

-0.100 

0.070 

0.069 

-0.028 

A244 

-0.009 

-0.042 

-0.049 

0.045 

0.008 

0.111 

-0.038 

0.058 

-0.135 

0.016 

Log(A246+0.001) 

0.508 

-0.221 

-0.136 

0.000 

-0.044 

0.010 

0.244 

0.010 

0.163 

0.060 

A247 

-0.076 

0.106 

0.068 

0.104 

0.077 

0.433 

-0.015 

-0.115 

0.150 

0.034 

Log(A248) 

-0.023 

0.511 

0.085 

-0.012 

-0.083 

0.068 

-0.390 

-0.104 

-0.165 

-0.017 

A250 

0.189 

0.116 

-0.046 

0.569 

-0.040 

-0.001 

-0.190 

0.117 

0.023 

0.011 

A252 

-0.213 

0.579 

0.068 

-0.269 

0.046 

-0.192 

-0.077 

0.158 

0.001 

0.143 

A253 

-0.764 

-0.103 

-0.150 

0.117 

0.127 

-0.218 

0.133 

0.037 

-0.042 

0.035 

A257 

-0.637 

0.095 

-0.230 

-0.178 

0.102 

0.054 

0.366 

0.058 

-0.305 

0.034 

A259 

0.540 

-0.024 

-0.145 

-0.317 

0.091 

0.103 

0.320 

-0.103 

-0.282 

0.016 

Log(A262) 

0.480 

-0.094 

0.247 

-0.051 

0.071 

-0.185 

-0.457 

0.176 

0.136 

0.216 

Log(A263) 

0.366 

-0.206 

-0.003 

-0.158 

0.055 

-0.037 

0.582 

0.053 

-0.031 

0.027 

Log(A264) 

0.653 

-0.103 

0.070 

-0.073 

0.141 

-0.072 

0.321 

0.343 

0.120 

0.051 

Log(A266+0.001) 

0.375 

0.030 

0.103 

0.046 

0.107 

-0.202 

0.043 

0.324 

-0.047 

-0.111 
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Again,  loadings  above  0.5  are  highlighted.  One  of  the  key  differences  between 
these  scores  and  the  scores  before  rotation  is  that  now  there  is  at  least  one  variable 
significantly  loaded  on  each  of  the  retained  factors,  with  the  exception  of  Factor  8. 
Before  rotation,  no  variables  loaded  heavily  on  factors  7  through  10.  Another 
preliminary  finding  is  that  many  variables,  18  of  the  60,  load  quite  heavily  on  the  first 
principal  factor.  This  suggests  that  the  first  factor  may  be  an  umbrella  encompassing 
many  attributes  across  the  spectrum  of  national  stability. 

Table  4-5  provides  a  list  of  the  variables  with  loadings  above  0.5  on  each  of  the 
factors,  as  well  as  suggested  labels  for  each.  For  Factor  8,  for  which  no  variables  loaded 
above  0.5,  the  variables  listed  had  loadings  of  0.48  and  -0.44  respectively.  Variables  in 
bold  loaded  negatively  on  the  given  factor. 
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Table  4-5:  Characterization  of  Principal  Factors 


Factor  1 :  The  Big  Picture  -  This  factor  encompasses  the  vast  majority  of  variables  experts  use  in  determining  the 
overall  status  of  a  country,  and  determining  national  stability. 

Log(A231  +0.001 )  Carbon  dioxide  emissions  (C02),  metric  tons  of  C02  per  capita  (CDIAC) 

Log(A1 19)  GDP  Per  Capita 

A1 85  Urban  population  (%  of  total) 

Log(-A243)  Balance  of  Payments:  imports  of  goods,  free  on  board,  US$  (IMF) 

Log(A264)  Number  of  Recorded  Drug  Crimes  Per  1000  Pop 

A192  Children  1  year  old  immunized  against  measles,  percentage 

A1 75  Ratio  of  female  to  male  enrollments  in  tertiary  education 

A259  Enrolment  in  total  secondary.  Public  and  private.  All  programs.  Total  % 

A1 77  Ratio  of  girls  to  boys  in  primary  and  secondary  education  (%) 

Log(A246+0.001)  Exchange  rate,  US$  per  national  currency  (IMF) 

Log(A1 67+0.001)  Population  growth  (annual  %) 

A120  Political  Terror  Rating 

A257  School  age  population.  Tertiary.  Total  % 

A253  School  age  population.  Primary.  Total  % 

Log(A215)  Tuberculosis  death  rate  per  100,000  population 

Log(A193)  Population  undernourished,  percentage 

A190  Children  under  five  mortality  rate  per  1,000  live  births 

Log(A174)  Pupil-teacher  ratio,  primary 

Factor  2:  Sustainability  -  This  factor  seems  to  capture  a  country's  population  and  their  ability  to  provide  for  it.  Also 
included  is  the  Count  of  Entries,  which  measures  various  organizations’  ability/desire  to  collect  data  on  each  nation. 
Log(A100)  Population 

Log(A155)  Land  area  (sq.  km) 

A252  Count  of  entries  in  database 

Log(-A243)  Balance  of  Payments:  imports  of  goods,  free  on  board,  US$  (IMF) 

Log(A248)  Imports  of  goods  and  services,  current  prices 

Log(A126)  Aid  per  capita  (current  US$) 

Factor  3:  Women's  Rights 

Log(A221  +0.001)  Seats  held  by  women  in  national  parliament 

Log(A1 72+0.001 )  Proportion  of  seats  held  by  women  in  national  parliament  (%) 

Factor  4:  Population  Growth 

A250  Migration,  international  net  rate  per  year 

Log(A167+0.001)  Population  growth  (annual  %) 

Factor  5:  Crowdedness 

Log(A166)  Population  density  (people  per  sq.  km) 

Factor  6:  Economic  Growth 

Log(A144)  GDP  per  capita  growth  (annual  %) 

Factor  7:  Crime  Rate 

Log(A263)  Number  of  Recorded  Murders  Attempted  Per  1000  Pop 

Factor  8:  Openness 

A1 35  Exports  of  goods  and  services  (%  of  GDP) 

Log(A153)  International  tourism,  expenditures  (%  of  total  imports) 

Factor  9:  Displaced  Persons 

Log(A118)  Refugees 

Factor  10:  Military  Focus 

Log(A1 59+0.001 )  Military  expenditure  (%  of  GDP) 
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The  insights  gained  from  the  characterization  of  the  principal  factors  suggested 
two  methods  for  selecting  variables  for  constructing  a  discriminant  model.  The  first  was 
to  select  the  variable  most  heavily  loaded  on  each  of  the  ten  factors  listed  above,  starting 
with  the  Big  Picture  factor.  Although  our  hypothesis  was  that  successful  discrimination 
was  possible  with  no  more  than  ten  variables,  we  continued  by  selecting  the  second  most 
important  variables  from  each  factor  to  see  if  additional  variables  added  to  the  model. 

Our  second  method  was  to  focus  on  the  two  factors  which  explained  the  majority  of  the 
variation  in  the  data,  Big  Picture  and  Sustainability.  We  chose  variables  in  order  of 
factor  loading  on  the  Big  Picture  factor  until  all  loadings  greater  than  0.5  were  exhausted, 
then  moved  on  to  Sustainability.  Table  4-6  provides  the  variables  chosen  using  both 
approaches.  The  results  of  the  DA  are  provided  in  Section  4.3. 


Table  4-6:  Variable  Selection  from  Factor  Analysis 


Selecting  Variables  from  All  Factors 

Selecting  Variables  from  Factors  1  and  2 

Carbon  dioxide  emissions  ,  metric  tons  of  C02  per  capita 

Carbon  dioxide  emissions,  metric  tons  of  C02  per  capita 

Population 

GDP  Per  Capita 

Proportion  of  seats  held  by  women  in  national  parliament 

Children  under  five  mortality  rate  per  1,000  live  births 

Number  of  Recorded  Crimes  Per  1000  Pop 

Pupil-teacher  ratio,  primary 

Migration,  international  net  rate  per  year 

Population  undernourished,  percentage 

GDP  per  capita  growth  (annual  %) 

Tuberculosis  death  rate  per  100,000  population 

Imports  of  goods  and  services,  current  prices 

School  age  population.  Primary.  Total  % 

Population  density  (people  per  sq.  km) 

Urban  population  (%  of  total) 

Military  expenditure  (%  of  GDP) 

Number  of  Recorded  Drug  Crimes  Per  1000  Pop 

GDP  Per  Capita 

Balance  of  Payments:  imports  of  goods,  free  on  board, 

US$  (IMF) 

Land  area  (sq.  km) 

Children  1  year  old  immunized  against  measles, 
percentage 

Seats  held  by  women  in  national  parliament 

Ratio  of  female  to  male  enrollments  in  tertiary  education 

School  age  population.  Tertiary.  Total  % 

School  age  population.  Tertiary.  Total  % 

Population  growth  (annual  %) 

Enrolment  in  total  secondary.  Public  and  private.  All 
programs.  Total  % 

GDP  annual  growth  rate,  1990  prices,  US$ 

Ratio  of  girls  to  boys  in  primary  and  secondary  education 
(%) 

Children  under  five  mortality  rate  per  1 ,000  live  births 

Political  Terror 

Refugees 

Population  growth  (annual  %) 

Pupil-teacher  ratio,  primary 

Exchange  rate,  US$  per  national  currency  (IMF) 

Count  of  entries 

Population 

Population  undernourished,  percentage 

Land  area  (sq.  km) 

4-9 


4.2.2.  Variable  Selection  via  Discriminant  Analysis 

As  described  in  Chapter  3,  variable  selection  within  DA  was  accomplished  in  two 
ways;  Stepwise  Forward  Selection,  and  Significance  within  the  Full  Model.  The 
resulting  prioritization  of  variables  is  shown  below. 


Table  4-7:  Variable  Selection  from  Discriminant  Analysis 


Forward  Stepwise  Selection 

Full  Model  Selection 

Balance  of  Payments:  imports  of  goods,  free  on  board, 

US$  (IMF) 

Balance  of  Payments:  imports  of  goods,  free  on  board, 

US$  (IMF) 

Population  undernourished,  percentage 

Population 

Aid  per  capita  (current  US$) 

Population  undernourished,  percentage 

Political  Terror 

Aid  per  capita  (current  US$) 

School  age  population.  Tertiary.  Total  % 

Political  Terror 

Children  under  five  mortality  rate  per  1 ,000  live  births 

GDP  Per  Capita 

Land  area  (sq.  km) 

Children  under  five  mortality  rate  per  1,000  live  births 

Tuberculosis  death  rate  per  100,000  population 

Land  area  (sq.  km) 

School  age  population.  Primary.  Total  % 

School  age  population.  Tertiary.  Total  % 

Political  Rights 

Tuberculosis  death  rate  per  100,000  population 

Share  of  women  in  wage  employment  in  the  non- 
agricultural  sector 

Pupil-teacher  ratio,  primary 

Agricultural  production  index,  1999-2001  =  100 

Carbon  dioxide  emissions  (C02),  metric  tons  of  C02  per 
capita  (CDIAC) 

Enrolment  in  total  secondary.  Public  and  private.  All 
programs.  Total  % 

School  age  population.  Primary.  Total  % 

Inflation,  GDP  deflator  (annual  %) 

Political  Rights 

Largest  Ethnic  Group  % 

Food  imports  (%  of  merchandise  imports) 

GDP  per  capita  growth  (annual  %) 

%  time  in  conflict  1990-2003 

Number  of  Disaster  Related  Deaths  (Zero  when  empty) 

Number  of  Recorded  Crimes  Per  1000  Pop 

Children  1  year  old  immunized  against  measles, 
percentage 

Share  of  women  in  wage  employment  in  the  non- 
agricultural  sector 

Use  of  IMF  credit  (DOD,  current  US$) 

Population  growth  (annual  %) 

Balance  of  Payments:  trade  balance,  goods  and  services, 
US$  (IMF) 

Number  of  Recorded  Drug  Crimes  Per  1000  Pop 

Following  variable  selection,  we  performed  Discriminant  Analysis  using 
Barnett’s  Core,  Rim,  Gap  Classification  with  each  of  the  four  sets  of  variables  defined 
above.  The  results  of  the  DA  for  each  of  the  models  are  provided  next. 
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4.3.  Discriminant  Analysis  -  Barnett 


Our  initial  classification,  based  on  Thomas  Barnett’s  work,  is  shown  in  Table  4-8. 
He  divides  the  countries  of  the  World  into  three  main  categories.  While  these  categories 
are  not  specifically  defined  as  stable  or  failing  states,  they  do  provide  an  open  source 
proxy  for  the  classification.  The  first  category  is  the  “Old  Functioning  Core”  which  is 
made  up  of  nations  whose  economies  are  integrated  with  the  rest  of  the  World,  and  who 
are  actively  participating  in  globalization.  The  “Non-Integrated  Gap”  consists  of  nations 
largely  left  out  of  the  global  integration  process.  In  between  are  states  which  are  working 
to  become  part  of  the  global  economy,  but  for  various  reasons  may  not  be  considered 
fully  integrated  at  this  time.  Several  Rim  States  such  as  China,  Russia,  Brazil  and  India, 
are  further  along  in  that  process  and  make  up  the  “New  Core”  (Barnett  Glossary  Online: 
http://www.thomaspmbamett.com/glossary.htm). 


Table  4-8:  Initial  Classification 


[http://www.thomaspmbarnett.com/glossary.htm] 


Integrated  Core 

New  Core  and  Rim  States 

Non-Integrated  Gap 

North  America 

China 

Caribbean  Rim 

Europe 

Russia 

Andean  South  America 

Japan 

Argentina 

Africa  (except  South  Africa) 

Industrialized  Asia 

Brazil 

Portions  of  the  Balkans 

Australia 

Chile 

Caucasus 

New  Zealand 

Mexico 

Central  Asia 

South  Africa 

Middle  East 

Morocco 

Algeria 

Greece 

Turkey 

Pakistan 

India 

Thailand 

South  Korea 

Malaysia 

Philippines 

Indonesia 

Southeast  Asia 
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We  first  tested  the  classification  to  see  if  there  were  statistically  significant 
differences  among  groups,  based  on  our  data.  For  this  we  computed  Rao’s  F-Test  for 
Wilks’  A,  and  obtained  the  following  results: 


Wilks'  Lambda  test  (Rao's  approximation): 

Lambda 

0.073 

F  (Observed  value) 

6.189 

F  (Critical  value) 

1.281 

DF1 

120 

DF2 

276 

p-value 

<  0.0001 

alpha 

0.05 

Test  interpretation: _ 

HO:  The  means  vectors  of  the  3  classes  are  equal. 

Ha:  At  least  one  of  the  means  vector  is  different  from  another. 

As  the  computed  p-value  is  lower  than  the  significance  level  alpha=0.05,  one  should  reject 
the  null  hypothesis  HO,  and  accept  the  alternative  hypothesis  Ha. 

The  risk  to  reject  the  null  hypothesis  HO  while  it  is  true  is  lower  than  0.01  %. 


This  suggests  that  there  are  differences  among  the  groups,  and  therefore  we  may  be  able 
to  discriminate  countries  based  on  our  initial  classification. 

We  next  built  multiple  Discriminant  Functions,  based  on  the  four  sets  of  variables 
selected  earlier.  For  comparison’s  sake,  we  also  discriminated  based  on  the  Factor 
Scores  produced  during  FA,  and  the  Component  Scores  from  Principal  Component 
Analysis  (PCA).  While  we  did  not  use  PCA  in  this  study,  the  results  are  provided  here  as 
it  is  a  common  data  reduction  technique  readers  may  be  familiar  with,  and  it  is  readily 
available  in  most  software  packages.  Adding  it  to  the  analysis  required  very  little  coding 
or  computation  time.  Refer  to  Lattin  et  al,  2003  or  Dillon  et  al,  1984  for  details  on  PCA. 
Recall  that  in  order  to  recreate  the  FA  and  PCA  scores,  an  analyst  would  need  all  60 
variables  used  during  the  analysis.  For  this  reason,  discriminating  based  on  Factor  or 
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Component  Scores  is  less  desirable  from  a  data  collection  efficiency  point-of-view,  but 
may  be  useful  for  comparison  as  more  data  is  included. 

Considering  our  goal  of  determining  a  minimal  set  of  variables  required  to 
classify  states,  discriminant  functions  were  built  one  variable  at  a  time.  At  each  iteration, 
we  checked  the  accuracy  of  the  model  by  its  ability  to  classify  states  into  their  prior 
classes.  A  confusion  matrix  shows  the  number  and  percentage  of  countries  classified  into 
each  group  as  compared  to  their  a  priori  designation.  Table  4-9  is  the  confusion  matrix 
for  one  such  iteration.  In  this  case,  all  variables  in  the  dataset  were  used  to  construct  a 
discriminant  function. 


Table  4-9:  Confusion  Matrix  Using  All  Variables 


from  \  to 

Core 

Rim 

Gap 

Total 

%  correct 

Core 

50 

0 

1 

51 

98.04% 

Rim 

0 

19 

0 

19 

100.00% 

Gap 

11 

7 

112 

130 

86.15% 

Total 

61 

26 

113 

200 

90.50% 

Here  we  see  an  overall  classification  accuracy  of  90.50%.  This  should  represent 
the  optimal  accuracy  we  could  hope  to  achieve  using  any  subset  of  variables  since  all 
data  available  were  used  to  construct  this  model.  It  is  possible  that  one  or  more  variables 
serve  to  confuse  the  situation  and  that  we  could  see  better  results  if  those  variables  were 
removed;  however  we  should  not  expect  to  achieve  significantly  greater  accuracy  using 
less  data  with  the  same  analysis  parameters.  Examining  the  matrix,  we  notice  that  the 
model  does  exceptionally  well  at  identifying  countries  Barnett  classified  as  Core  or  Rim. 
Of  the  70  Core  and  Rim  states,  only  one,  New  Caledonia,  was  misclassified.  New 
Caledonia  is  a  small,  French  occupied  island  in  the  Pacific  Ocean  off  the  East  coast  of 
Australia  (http://en.wikipedia.org/wiki/New_Caledonia,  Feb  2007),  and  was  one  of  13 
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countries  we  retained  for  our  study  for  which  we  found  less  the  half  of  the  data.  This 
means  that  due  to  the  data  imputation,  the  data  used  to  classify  New  Caledonia  came 
more  from  other  countries  than  from  the  nation  itself.  Interestingly,  as  seen  later  in  Table 
4-16,  the  reduced  model  using  only  ten  variables  classifies  this  country  correctly,  though 
only  five  of  the  ten  variables  were  populated. 

The  model  does  not  perform  as  strongly  when  identifying  Gap  states,  though 
accuracy  is  still  over  86%.  The  overall  accuracies  for  each  variable  selection  method  as 
progressively  more  variables  are  added  to  the  model  are  plotted  in  Figure  4-2. 
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Figure  4-2:  Model  Accuracy 
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The  three  plots  show  the  accuracies  of  the  models  as  additional  features  are  added 
based  on  the  Linear  Discriminant  Function  used  in  this  thesis,  as  well  as  two  other 
methods  which  are  included  for  comparison.  Several  key  insights  can  be  gleaned  from 
these  figures.  First,  regardless  of  the  discriminant  function  used,  the  ranking  of  the 
variable  selection  procedures  remains  fairly  constant,  with  Stepwise  and  Full  Model  DA 
selection  performing  slightly  better  than  the  others.  Simply  put,  this  means  that  the 
variables  selected  using  those  methods  perform  better  for  discriminating  states,  though 
only  by  approximately  1%.  The  similarity  in  perfonnance  is  not  surprising  considering 
the  overlap  in  the  lists  of  variables  used  to  construct  each  model. 

Second,  the  marginal  improvement  in  accuracy  with  each  additional  variable 
diminishes  as  the  model  grows,  which  is  to  be  expected,  particularly  if  we  use 
Mahalanobis’  Method  for  building  the  discriminant  functions.  For  example,  if  we  look  at 
the  Mahalanobis  chart  in  the  lower  left  corner,  we  see  that  both  DA  methods  start  out 
with  approximately  87.5%  accuracy  using  only  one  variable,  improve  to  90%  with  seven 
variables,  but  then  appear  to  level  off.  The  improvement  in  accuracy  as  we  add  variables 
to  the  linear  function  appears  more  constant.  Recall  that  we  could  achieve  90.5% 
accuracy  if  we  used  all  60  variables  as  shown  in  Table  4-9.  Since  we  are  able  to  achieve 
this  accuracy  with  fewer  than  10  variables  using  the  Mahalanobis  Method,  this  may  be  a 
technique  worth  exploring  in  future  work. 

One  result  we  found  surprising  initially  was  that  the  dimension  reduction 
techniques,  PCA  and  FA,  did  not  outperform  the  techniques  involving  individual 
variables.  Since  the  first  principal  components  or  first  principal  factors  account  for  a 
significant  amount  of  variation  in  several  variables,  one  might  expect  the  models  using 
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the  component  or  factor  scores  to  perform  significantly  better  than  those  using  individual 
variables,  at  least  when  only  one  or  two  features  are  in  the  model.  However,  remember 
that  the  factors  capture  any  variance  in  the  data,  regardless  of  whether  or  not  it  is  relevant 
to  the  classification.  If  we  reexamine  Table  4-4  we  see  that  that  several  of  the  key 
variables  load  very  heavily  on  the  first  few  principal  factors.  Therefore,  these  variables 
may  be  sufficient  proxies  for  the  underlying  factors,  and  they  appear  to  perform  just  as 
well  for  discriminating  states  as  the  factors  constructed  using  all  60  variables.  This  is  the 
first  significant  finding  of  this  study,  and  supports  our  hypothesis  that  states  can  be 
classified  using  as  few  as  ten  objective,  readily  available  measures. 

To  select  our  final  model,  we  compared  the  accuracies  of  each  of  the  constructed 
models.  The  model  based  on  the  variables  chosen  through  Stepwise  Selection 
consistently  out-performed  the  others,  particularly  when  staying  within  our  self-imposed 
limit  of  using  at  most  ten  key  variables.  Therefore,  the  remainder  of  our  results  is  taken 
from  the  10-variable  model  constructed  using  Forward  Stepwise  Selection.  The  variables 
comprising  that  model  are  listed  in  Table  4-10. 


Table  4-10:  Variables  Used  in  Final  Model 


Variable  Code 

Series  Name 

Log(-A243) 

Balance  of  Payments:  imports  of  goods,  free  on  board,  US$  (IMF) 

Log(A193) 

Population  undernourished,  percentage 

Log(A126) 

Aid  per  capita  (current  US$) 

A120 

Political  Terror 

A190 

Children  under  five  mortality  rate  per  1 ,000  live  births 

Log(A155) 

Land  area  (sq.  km) 

A257 

School  age  population.  Tertiary.  Total  % 

Log(A215) 

Tuberculosis  death  rate  per  100,000  population 

A122 

Political  Rights 

A225 

Share  of  women  in  wage  employment  in  the  non-agricultural  sector 
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If  we  consider  the  nations  classified  as  Core,  Rim,  or  Gap  states,  we  are  interested 
in  whether  or  not  the  average  value  for  each  of  these  variables  differs  significantly  across 


groups,  meaning  each  variables  can  significantly  contribute  to  the  classification  model. 


The  results  of  the  tests  for  differences  between  group  means  for  each  variable  are  shown 
in  Table  4-11. 


Table  4-11:  Tests  for  Differences  Between  Group  Means 


Variable 

F 

DF1 

DF2 

p-value 

Log(-A243) 

62.363 

2 

197 

<  0.0001 

Log(A193) 

54.681 

2 

197 

<  0.0001 

Log(A126) 

52.514 

2 

197 

<  0.0001 

A120 

50.266 

2 

197 

<  0.0001 

A190 

39.896 

2 

197 

<  0.0001 

Log(A155) 

39.727 

2 

197 

<  0.0001 

A257 

37.677 

2 

197 

<  0.0001 

Log(A215) 

34.235 

2 

197 

<  0.0001 

A225 

22.759 

2 

197 

<  0.0001 

A122 

30.078 

2 

197 

<  0.0001 

From  this  we  see  that  in  fact  each  of  the  ten  variables  show  significant  differences 
across  groups.  Constructing  a  discriminant  function  based  on  these  variables,  we  obtain 
the  resulting  linear  discriminant  functions  provided  in  Table  4-12. 


Table  4-12:  Discriminant  Functions 


Core 

Rim 

Gap 

Intercept 

-319.417 

-329.287 

-309.575 

Log(-A243) 

19.071 

19.242 

18.550 

Log(A1 93) 

12.654 

12.497 

13.871 

Log(A126) 

6.598 

5.610 

6.234 

A120 

-6.750 

-5.854 

-6.250 

A190 

0.155 

0.125 

0.165 

Log(A1 55) 

-1.727 

-1.300 

-1.823 

A257 

1684.570 

1754.207 

1688.771 

Log(A215) 

-6.903 

-5.709 

-6.951 

A225 

1.005 

0.861 

0.984 

A122 

1.392 

1.160 

1.782 
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The  magnitude  of  the  coefficients  on  variable  257  warrants  further  investigation. 
Returning  to  the  data,  variable  257,  Percentage  of  the  Population  aged  18-22,  is  the  only 
variable  retained  as  a  percentage,  and  not  transformed  via  natural  logarithm. 

Furthermore,  the  maximum  value  achieved  is  12.7%  meaning  the  values  are  very  small  in 
comparison  to  other  variables.  The  unusually  large  coefficients  therefore  do  not  have  as 
drastic  an  effect  as  one  might  imagine. 

To  use  these  functions,  we  input  a  country’s  values  for  each  variable,  multiply  by 
the  given  coefficients,  and  sum  the  values  together  with  the  Intercept  value.  For 
example,  if  we  wish  to  classify  Somalia,  we  first  collect  the  necessary  data. 


Table  4-13:  Classification  Example  -  Somalia 


Log(-A243) 

21.504 

Log(A193) 

3.784 

Log(A126) 

3.179 

A120 

4.000 

A190 

225.000 

Log(A155) 

13.349 

A257 

0.092 

Log(A215) 

4.787 

A225 

45.400 

A122 

6.000 

Multiplying  each  of  these  values  by  their  respective  coefficients  and  summing  we  obtain: 


Table  4-14:  Somalia  Classification  Scores 


Core 

Rim 

Gap 

Somalia  Scores 

320.000 

316.774 

326.436 

Since  Somalia  scores  highest  with  the  Gap  function,  we  would  label  it  as  such. 
Alternatively,  we  could  use  the  canonical  discriminant  functions  which  are  the  orthogonal 
mappings  of  the  observations  in  discriminant  function  space.  We  calculate  the  canonical 
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discriminant  function  scores  for  each  country  and  determine  which  group  centroid  the 
country  is  closest  too.  The  two  methods  result  in  identical  classifications. 

In  the  case  of  Somalia,  our  classification  agrees  with  Barnett’s.  Looking  at  all 
countries,  we  see  that  the  10-variable  model  achieves  the  following  accuracies. 


Table  4-15:  10-Variable  Model  Confusion  Matrix 


from  \  to 

Core 

Rim 

Gap 

Total 

%  correct 

Core 

43 

1 

7 

51 

84.31% 

Rim 

2 

17 

0 

19 

89.47% 

Gap 

17 

14 

99 

130 

76.15% 

Total 

62 

32 

106 

200 

79.50% 

As  with  all  models  we  explored,  our  final  model  does  well  at  classifying  Core  and 
Rim  states,  but  has  higher  variability  with  Gap  countries.  There  are  two  possible  reasons 
for  misclassification;  either  our  model  is  insufficient  to  correctly  classify  all  states,  or  the 
original  classifications  were  incorrect.  That  is,  perhaps  Barnett’s  Core,  Rim,  Gap 
classifications  vary  from  classifications  of  Stable,  Borderline,  Failing  states.  It  is 
important  at  this  point  to  revisit  our  original  classification  and  investigate  the  countries 
which  are  being  misclassified.  Table  4-16  shows  the  states  misclassified  by  our  final 
model,  and  the  probabilities  of  belonging  to  each  group. 
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Table  4-16:  Nations  Misclassified  Using  10-Variable  Model 


Observation 

Barnett 

Model 

Pr(Core) 

Pr(Rim) 

Pr(Gap) 

Belarus 

Core 

Gap 

0.450 

0.055 

0.495 

Fiji 

Core 

Gap 

0.311 

0.018 

0.671 

Malta 

Core 

Gap 

0.478 

0.001 

0.521 

Moldova 

Core 

Gap 

0.307 

0.007 

0.686 

Mongolia 

Core 

Gap 

0.162 

0.033 

0.805 

Tonga 

Core 

Gap 

0.031 

0.001 

0.968 

Vanuatu 

Core 

Gap 

0.224 

0.000 

0.776 

Hong  Kong 

Core 

Rim 

0.323 

0.630 

0.047 

Andorra 

Gap 

Core 

0.961 

0.001 

0.039 

Barbados 

Gap 

Core 

0.811 

0.000 

0.189 

Bosnia  and  Herzegovina 

Gap 

Core 

0.608 

0.001 

0.391 

Bulgaria 

Gap 

Core 

0.782 

0.008 

0.210 

Cayman  Islands 

Gap 

Core 

0.943 

0.000 

0.057 

Costa  Rica 

Gap 

Core 

0.559 

0.270 

0.171 

Croatia 

Gap 

Core 

0.758 

0.030 

0.212 

Cyprus 

Gap 

Core 

0.918 

0.000 

0.082 

Israel 

Gap 

Core 

0.726 

0.009 

0.266 

Macedonia 

Gap 

Core 

0.514 

0.001 

0.485 

Mauritius 

Gap 

Core 

0.532 

0.062 

0.407 

Palau 

Gap 

Core 

0.744 

0.005 

0.251 

Puerto  Rico 

Gap 

Core 

0.667 

0.002 

0.331 

Romania 

Gap 

Core 

0.562 

0.396 

0.042 

Serbia 

Gap 

Core 

0.738 

0.001 

0.260 

Singapore 

Gap 

Core 

0.762 

0.017 

0.221 

Tunisia 

Gap 

Core 

0.387 

0.271 

0.342 

Bangladesh 

Gap 

Rim 

0.005 

0.710 

0.285 

Brunei 

Gap 

Rim 

0.355 

0.439 

0.206 

Colombia 

Gap 

Rim 

0.117 

0.524 

0.359 

Ecuador 

Gap 

Rim 

0.085 

0.822 

0.093 

Egypt 

Gap 

Rim 

0.020 

0.924 

0.056 

Iran 

Gap 

Rim 

0.000 

0.997 

0.003 

Nigeria 

Gap 

Rim 

0.007 

0.653 

0.339 

Paraguay 

Gap 

Rim 

0.001 

0.957 

0.042 

Peru 

Gap 

Rim 

0.055 

0.893 

0.052 

Gatar 

Gap 

Rim 

0.300 

0.505 

0.195 

Sudan 

Gap 

Rim 

0.001 

0.669 

0.330 

Syria 

Gap 

Rim 

0.007 

0.912 

0.080 

United  Arab  Emirates 

Gap 

Rim 

0.037 

0.928 

0.036 

Venezuela 

Gap 

Rim 

0.035 

0.695 

0.270 

Chile 

Rim 

Core 

0.540 

0.402 

0.057 

Greece 

Rim 

Core 

0.916 

0.055 

0.029 
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Recall  that  had  we  used  our  entire  set  of  60  variables,  we  would  have  achieved 


approximately  90%  accuracy,  compared  to  the  80%  accuracy  of  the  10-variable  model. 


The  difference  in  classifications  can  be  thought  of  as  the  risk  associated  with  using  a 


reduced  variable  set.  Table  4-17  shows  the  countries  misclassified  using  the  full  model. 


Table  4-17:  Nations  Misclassified  Using  Full  Model 


Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

New  Caledonia 

Core 

Gap 

0.258 

0.742 

0.000 

Andorra 

Gap 

Core 

0.975 

0.025 

0.001 

Barbados 

Gap 

Core 

0.775 

0.225 

0.000 

Bosnia  and  Herzegovina 

Gap 

Core 

0.604 

0.396 

0.000 

Costa  Rica 

Gap 

Core 

0.969 

0.030 

0.000 

Croatia 

Gap 

Core 

0.992 

0.008 

0.000 

Cyprus 

Gap 

Core 

0.971 

0.029 

0.000 

Kuwait 

Gap 

Core 

0.566 

0.432 

0.002 

Palau 

Gap 

Core 

0.712 

0.276 

0.012 

Puerto  Rico 

Gap 

Core 

0.577 

0.423 

0.000 

Romania 

Gap 

Core 

0.999 

0.001 

0.000 

Serbia 

Gap 

Core 

0.790 

0.210 

0.000 

Bangladesh 

Gap 

Rim 

0.002 

0.131 

0.867 

Egypt 

Gap 

Rim 

0.001 

0.027 

0.973 

Iran 

Gap 

Rim 

0.000 

0.006 

0.994 

Lesotho 

Gap 

Rim 

0.010 

0.077 

0.913 

Paraguay 

Gap 

Rim 

0.001 

0.450 

0.549 

Peru 

Gap 

Rim 

0.001 

0.030 

0.969 

Syria 

Gap 

Rim 

0.009 

0.167 

0.825 

The  three  countries  highlighted  in  bold,  New  Caledonia,  Kuwait,  and  Lesotho, 
were  misclassified  by  the  full  model,  but  not  by  the  reduced  model.  Of  the  41  states 
misclassified  by  the  reduced  model,  16  were  still  misclassified  when  all  60  variables  were 
used,  all  but  one  of  which  was  previously  classified  as  a  Gap  Country.  This  suggests  the 
possibility  that  these  16  were  not  correctly  classified  initially.  It  remains  for  the  decision 
maker  to  decide  whether  or  not  the  additional  resources  required  to  collect  data  on  the  50 
extra  variables  is  worth  the  gain  of  improving  the  model’s  accuracy  by  10%.  Recall  too 
that  accuracy  equivalent  to  that  of  the  full  model  may  be  achieved  with  the  ten  variables 
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if  alternate  discriminant  functions  are  used.  Furthermore,  the  Barnett  classification  is 
used  only  as  a  proxy  for  state  stability. 

We  can  further  analyze  the  classifications  of  various  countries  graphically.  Using 
the  standardized  canonical  discriminant  function  coefficients,  we  can  plot  countries  in 
Discriminant  Function  Space.  As  with  FA,  the  Canonical  Discriminant  Functions  can  be 
characterized  according  the  variables  loading  heavily  in  each  dimension. 

Figure  4-3  provides  a  visual  representation  of  the  variables  most  useful  for 
discriminating  on  each  dimension.  It  shows  the  direction,  in  discriminant  function  space, 
and  magnitude  of  the  correlations  between  the  variables  and  the  discriminant  functions. 


Variable/Discriminant  Function  Correlations 


Figure  4-3:  Correlations  of  Variables  with  Discriminant  Functions 


4-22 


We  see  that  the  first  discriminant  function  is  most  heavily  influenced  by  the  three 
variables  identified  earlier;  Imports,  Foreign  Aid,  and  Land  Area.  Recall  from  the  Factor 
Analysis  that  these  three  variables  comprised  the  second  principal  factor  which  we 
labeled  “Sustainability”.  Other  variables  also  provide  some  input  into  this  dimension,  but 
a  few  are  far  more  critical  to  the  second  dimension.  Six  of  the  variables  loading  heavily 
on  the  second  discriminant  function  also  loaded  heavily  on  the  first  principal  factor  we 
called  “The  Big  Picture”.  These  are  Imports,  Political  Terror,  Population  aged  18-22, 
Tuberculosis  Death  Rate,  Percent  of  People  Undernourished,  and  Child  Mortality  Rate. 

It  is  no  coincidence  that  the  first  two  Principal  Factors  correspond  to  the  two 
discriminant  functions.  Both  techniques  attempt  to  discover  the  underlying  structure  of 
the  data  by  finding  linear  combinations  which  form  mutually  orthogonal  functions.  It 
should  not  be  surprising  then  that  the  two  methods  produce  similar  pictures  of  the  data’s 
true  structure.  Furthermore,  we  should  expect  that  the  variables  we  found  to  explain  the 
most  variation  in  the  dataset  would  also  be  most  useful  for  classifying  states.  Table  4-18 
provides  the  Canonical  Discriminant  Functions. 


Table  4-18:  Canonical  Discriminant  Functions 


Discriminant  Discriminant 
Function  1  Function  2 


Intercept 

-5.057 

-4.498 

Log(-A243) 

0.152 

0.234 

Log(A1 93) 

-0.266 

-0.544 

Log(A126) 

-0.296 

0.146 

A120 

0.240 

-0.208 

A190 

-0.013 

-0.005 

Log(A1 55) 

0.172 

0.049 

A257 

24.570 

-0.791 

Log(A215) 

0.442 

0.040 

A225 

-0.049 

0.007 

A122 

-0.151 

-0.177 
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Note  that  the  coefficient  corresponding  to  Tuberculosis  Death  Rate  has  a 
counterintuitive  positive  sign.  One  would  expect  that  a  lower  number  of  deaths  due  to 
Tuberculosis  would  be  better  for  a  nations  overall  status.  Montgomery  et  al  describe 
three  reasons  that  may  explain  a  variable  having  the  “wrong”  sign  in  this  or  any 
regression  model.  If  the  range  of  one  of  the  regressors  is  small  in  relation  to  other 
variables  in  the  model,  the  variance  in  the  estimate  of  the  regression  coefficient  will  be 
large,  resulting  in  a  lower  confidence  estimate.  In  this  situation,  the  range  of  the 
Tuberculosis  data  is  (-1.204,  5.596)  compared  to,  for  example  the  range  of  the  Child 
Mortality  Rate  which  is  (3,  283).  Another  reason  for  the  positive  sign  could  be  severe 
multicollinearity  which  can  also  increase  the  variance  of  the  coefficient  estimates,  again 
increasing  the  probability  of  seeing  a  counterintuitive  sign.  Finally,  either  one  or  more 
important  regressors  may  be  left  out  of  the  model,  or  other  regressors  in  the  model  are 
causing  the  sign  to  change.  The  coefficients  measure  the  effects  of  a  variable  given  that 
each  of  the  other  variables  is  in  the  model.  In  other  words,  it  may  be  that  Tuberculosis 
Deaths  do  have  a  negative  effect  on  Function  2  scores  if  considered  alone,  but  with  other 
variables  already  in  the  model,  the  net  effect  on  that  function  may  be  positive 
(Montgomery  et  al,  2001 :  120-2).  This  appears  to  be  the  most  likely  cause  of  the  positive 
Tuberculosis  Death  Rate  coefficient  in  this  case.  When  all  60  variables  are  used,  we  see 
that  in  fact  the  sign  is  negative,  and  Tuberculosis  has  a  coefficient  of  -0.5 12.  This 
suggests  that  the  particular  combination  of  variables  used  in  the  reduced  model  causes  the 
Function  2  coefficient  for  Tuberculosis  to  change  sign. 
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A  sample  of  selected  countries  is  plotted  in  Figure  4-4.  The  ellipses  represent 


95%  confidence  intervals  around  the  group  centroids. 


Select  Observations  in  Discriminant  Function  Space 
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Figure  4-4:  Classification  of  States 


It  is  clear  from  the  figure  that  it  is  indeed  more  difficult  to  classify  Gap  countries. 
In  fact,  we  see  that  the  95%  confidence  interval  around  the  Gap  centroid  actually  contains 
the  other  group  centroids.  We  also  see  that  there  is  a  wider  spread  of  countries  which 
were  pre-classified  as  Gap. 

Recall  that  for  the  three  group  case,  two  discriminant  functions  are  used.  While 
our  model  may  not  correctly  identify  every  Gap  state  defined  by  Barnett,  there  does 


4-25 


appear  to  be  a  distinct  separation  of  the  extremely  critical  nations  by  the  second 
discriminant  function,  shown  on  the  vertical  scale.  Countries  scoring  lowest  in  this 
dimension  include  Somalia,  Djibouti,  Congo,  Iraq,  Afghanistan,  and  North  Korea,  while 
Germany,  Japan  and  the  United  States  score  highest.  It  appears  then  that  we  may  have  a 
useful  model  for  identifying  the  most  critical  nations,  and  for  selecting  states  for  further 
analysis. 

To  achieve  a  high  score  on  Function  2,  the  Big  Picture  Function,  a  country  would 
need  high  values  for  variables  with  positive  coefficients,  and  low  values  for  variables 
with  negative  coefficients.  Looking  at  two  examples,  we  see  from  Table  4-16  that  Israel 
was  predefined  to  be  a  Gap  country,  but  was  classified  by  our  model  as  being  part  of  the 
Core.  Conversely,  Mongolia,  which  was  originally  classified  a  Core  country,  has  now 
been  classified  as  a  Gap  country.  Table  4-19  provides  data  for  each  of  these  countries, 
their  scores  from  the  second  discriminant  function,  as  well  as  summary  statistics  for  all 
countries  analyzed.  The  values  in  parenthesis  are  the  “worst”  of  Israel,  Mongolia,  and 
the  Mean. 


Table  4-19:  Comparison  of  Function  2  Scores  for  Israel  and  Mongolia 


Function  2 
Coefficients 

Israel 

Mongolia 

Mean 

(All) 

St.  Dev. 
(All) 

Intercept 

-4.498 

1 

1 

Imports 

0.234 

24.502 

(20.619) 

22.545 

2.233 

Pop.  Undernourished 

-0.544 

2.197 

(3.332) 

2.175 

1.013 

Foreign  Aid 

0.146 

4.255 

4.646 

(3.214) 

1.694 

Political  Terror  (Lower  is  Better) 

-0.208 

(4.000) 

2.000 

2.560 

1.069 

Child  Mortality  (Per  1000) 

-0.005 

6.000 

52.000 

(59.070) 

65.864 

Land  Area 

0.049 

(9.986) 

14.264 

11.265 

2.648 

Pop  %  Aged  18-22 

-0.791 

0.081 

(0.107) 

0.088 

0.017 

Tuberculosis  Deaths 

0.040 

(-0.105) 

3.186 

2.174 

1.641 

Women  Share  of  Workplace 

0.007 

49.600 

50.300 

(39.211) 

11.524 

Political  Rights  (Lower  is  Better) 

-0.177 

1.000 

2.000 

(3.295) 

2.117 

Function  2  Scores 

0.407 

-0.708 

-0.466 
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Recall  that  several  of  the  variables  in  the  model  were  transformed  to  more  closely 
resemble  a  normal  distribution.  This  transformation  explains,  for  example,  the  negative 
value  for  Tuberculosis  Death  Rate  for  Israel.  This  means  their  value  was  low,  but  not 
actually  negative.  Israel  scores  lower  than  the  mean  in  three  of  the  ten  categories,  but 
scores  high  enough  in  other  areas  to  compensate,  and  as  a  result  still  receives  a  Core 
classification.  The  fact  that  Barnett  classified  Israel  as  a  Gap  country  may  be  due  more  to 
its  proximity  to  other  troubled  nations,  which  is  a  factor  not  included  in  our  dataset. 
Mongolia  scores  poorly,  at  least  one  standard  deviation  from  the  mean,  in  three  areas 
including  Level  of  Imports,  Percentage  of  the  Population  Undernourished,  and 
Percentage  of  the  Population  aged  18-22.  These  appear  to  be  the  primary  reasons  for  its 
Gap  classification.  Similar  specific  analyses  can  be  conducted  for  any  nation  of  interest 
in  this  study. 

The  first  discriminant  function,  shown  on  the  horizontal  axis  of  Figure  4-3, 
separates  Rim  countries  from  Core  and  Gap  countries.  As  our  confusion  matrices 
confirm,  it  appears  to  be  easiest  to  segregate  the  Rim  countries  from  the  others.  At  first 
glance,  it  may  seem  counter-intuitive  that  it  is  easier  to  identify  the  states  which  are  by 
definition  hard  to  categorize.  However,  such  speculation  assumes  that  all  countries  lie  in 
only  the  one  dimension,  represented  by  the  vertical  axis.  While  Rim  states  do  lie 
between  Gap  and  Core  states  in  the  second  dimension,  there  are  differences  in  other 
dimensions  that  must  be  considered.  Reexamining  Table  4-12  provides  insight  into 
which  variables  are  particularly  useful  for  identifying  Rim  states. 
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Table  4-20:  Discriminant  Functions  (Repeat  of  Table  4-12) 


Core 

Rim 

Gap 

Intercept 

-319.417 

-329.287 

-309.575 

Log(-A243) 

19.071 

19.242 

18.550 

Log(A1 93) 

12.654 

12.497 

13.871 

Log(A126) 

6.598 

5.610 

6.234 

A120 

-6.750 

-5.854 

-6.250 

A190 

0.155 

0.125 

0.165 

Log(A1 55) 

-1.727 

-1.300 

-1.823 

A257 

1684.570 

1754.207 

1688.771 

Log(A215) 

-6.903 

-5.709 

-6.951 

A225 

1.005 

0.861 

0.984 

A122 

1.392 

1.160 

1.782 

Several  variables  appear  to  have  similar  coefficients  for  Gap  and  Core  countries, 
but  are  different  for  Rim  countries.  Adjusting  the  three  group  model,  we  built  a 
discriminant  function  to  classify  states  only  as  either  Rim  or  Non-Rim.  As  shown  in 
Table  4-21,  the  Sustainability  variables  dealing  with  Imports,  Foreign  Aid,  and  Land 
Area  are  very  significant  for  distinguishing  Rim  states  from  the  other  groups. 


Table  4-21:  Significance  of  Variables  for  Distinguishing  Rim  Countries 


Variable 

F 

DF1 

DF2 

p-value 

Log(-A243) 

123.445 

1 

198 

<  0.0001 

Log(A126) 

114.995 

1 

198 

<  0.0001 

Log(A155) 

105.982 

1 

198 

<  0.0001 

A120 

19.811 

1 

198 

<  0.0001 

A190 

14.159 

1 

198 

0.000 

A225 

12.670 

1 

198 

0.000 

Log(A193) 

8.430 

1 

198 

0.004 

A257 

3.491 

1 

198 

0.063 

Log(A215) 

1.892 

1 

198 

0.171 

A122 

0.074 

1 

198 

0.786 

Therefore,  it  appears  we  have  a  two  function  model.  One  function  separates  the  Rim 
countries,  while  the  other  distinguishes  between  The  Gap  and  The  Core. 
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4.4.  Discriminant  Analysis  -  Fund  for  Peace 

Our  final  task  was  to  determine  if  the  variables  selected  for  our  final  model  were 
indeed  sufficient  for  classifying  failing  states,  or  if  their  usefulness  was  unique  to 
Barnett’s  classifications.  To  test  this,  we  applied  the  same  DA  techniques  to  the  Fund  for 
Peace  (FFP)  2006  Failed  States  Index.  Recall  the  FFP  publishes  an  annual  Index  which 
provides  scores  for  each  country  indicating  their  current  stability. 

The  FFP  provided  scores  for  146  of  the  200  countries  previously  analyzed,  so 
only  those  nations  were  used  in  this  analysis.  For  consistency’s  sake,  we  divided  the 
nations  into  three  groups.  The  states  are  given  a  score  on  a  scale  from  0  to  120  in  the 
FFP  data,  but  are  given  no  categorical  assignment.  We,  therefore  needed  to  choose  cut¬ 
off  points  for  each  class.  As  shown  in  Table  4-5,  the  scores  themselves  do  not  seem  to 
provide  natural  break  points  between  classes. 


FFP  Scores 


Figure  4-5:  Fund  for  Peace  Scores 

Lacking  clear  breakpoints,  the  nations  were  simply  divided  into  three  groups,  as  close  to 
equal  size  as  possible.  Group  1  consisted  of  nations  posing  the  highest  risk  of  failing, 
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Group  2  countries  were  considered  medium  risk,  and  Group  3  was  made  up  of  the  most 
stable  countries  having  the  lowest  risk  of  state  failure. 

We  proceeded  in  the  same  manner  as  before,  foregoing  the  variable  selection 
process.  Using  the  ten  variables  from  our  final  model,  we  achieved  the  results  in  Table  4- 
22.  States  pre-classified  as  High,  Medium  and  Low  risk  are  analogous,  but  certainly  not 
equivalent,  to  the  Gap,  Rim,  and  Core  classifications  from  Barnett. 


Table  4-22:  Confusion  Matrix  using  Fund  for  Peace  Classification 


from  \  to 

High 

Med 

Low 

Total 

%  correct 

High 

38 

10 

0 

48 

79.17% 

Med 

7 

38 

4 

49 

77.55% 

Low 

0 

8 

41 

49 

83.67% 

Total 

45 

56 

45 

146 

80.14% 

We  first  notice  that  the  Discriminant  Function  constructed  using  the  same  ten 
variables  again  achieves  approximately  80%  accuracy.  Just  as  the  discriminant  function 
used  on  Barnett’s  classification  was  better  at  classifying  Core  and  Rim  states,  this 
function  appears  to  do  slightly  better  with  the  Low  Risk  nations.  One  notable 
improvement  is  that  no  countries  previously  identified  as  Low  Risk  were  classified  as 
High  Risk,  and  vice-versa.  The  ambiguities  all  appear  within  the  Medium  Risk 
classifications. 

Examining  the  Discriminant  Functions  themselves,  we  see  definite  consistencies 
between  the  FFP  and  Barnett  models.  Figure  4-6  shows  the  variable-function 
correlations. 
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Barnett  Variable/Function  Correlations 


Figure  4-6:  Comparison  of  Discriminant  Function  Structure 


Fundamentally,  the  structure  of  the  data  remains  the  same.  Flowever,  the 
orientation  has  rotated  90  degrees  clockwise.  Therefore,  the  Big  Picture  function, 
represented  by  the  second  discriminant  function  under  the  Barnett  case,  is  now  the  first 
function,  shown  on  the  horizontal  axis.  We  notice  also  that  the  variables  most  heavily 
loaded  on  the  Sustainability  Function  before  have  all  but  disappeared  in  the  new  model  as 
evidenced  by  the  shorter  radii.  In  fact,  these  variables  are  not  significant  to  the  model  at 
the  .05  alpha  level.  This  is  not  surprising  as  we  have  already  determined  that  this 
function  is  orthogonal  to  the  Big  Picture  function,  and  should  therefore  not  be  expected  to 
substantially  contribute  to  discrimination  along  the  inherently  one  dimensional  Failed 
States  Index.  Our  original  Function  2  seems  to  capture  the  majority  of  what  the  Fund  for 
Peace  uses  to  classify  states  according  to  their  likelihood  of  failure. 

Table  4-23  shows  a  comparison  between  the  discriminant  functions  derived  from 
using  both  prior  classifications. 
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Table  4-23:  Comparison  of  Discriminant  Functions 


Barnett  Discriminant  Functions 


FI 

Intercept 

-5.057 

-4.498 

Log(-A243) 

0.152 

0.234 

Log(A193) 

-0.266 

-0.544 

Log(A126) 

-0.296 

0.146 

A120 

0.240 

-0.208 

A190 

-0.013 

-0.005 

Log(A155) 

0.172 

0.049 

A257 

24.570 

-0.791 

Log(A215) 

0.442 

0.040 

A225 

-0.049 

0.007 

A122 

-0.151 

-0.177 

FFP  Discriminant  Functions 


FI 

F2 

Intercept 

-1.504 

-0.403 

Log(-A243) 

0.229 

-0.197 

Log(A193) 

-0.358 

0.495 

Log(A126) 

0.011 

-0.011 

A120 

-0.782 

0.006 

A190 

-0.003 

-0.018 

Log(A155) 

-0.018 

0.086 

A257 

4.181 

39.356 

Log(A21 5) 

0.050 

-0.134 

A225 

0.007 

0.018 

A122 

-0.320 

0.023 

The  two  columns  in  bold  represent  the  Big  Picture  function  for  each  of  the 
respective  cases.  On  inspection,  the  two  functions  appear  quite  similar.  The  only 
noticeable  difference  is  variable  257,  Percentage  of  the  Population  aged  18-22.  This 
variable  now  has  a  positive  coefficient  suggesting  that  a  greater  percentage  of  people  of 
this  age  improves  national  stability.  However,  previously  discussed  issues  with  the  scale 
of  this  variable  may  account  for  this  discrepancy. 

Considering  the  similarities  between  the  two  functions,  we  would  expect  a  plot  of 
observations  in  the  new  Discriminant  Function  space  to  be  similar  to  the  original,  except 
rotated.  This  situation  is  seen  in  Figure  4-7. 
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Select  Observations  in  FFP 
Discriminant  Function  Space 


Figure  4-7:  Classification  of  States  -  FFP 


The  position  of  countries  along  the  horizontal  in  this  plot  is  highly  coincident  with 
the  vertical  axis  from  the  Barnett  case.  We  also  see  that  the  Medium  Risk  centroid  is 
much  closer  to  being  directly  between  the  other  two  groups,  again  suggesting  the  one- 
dimensionality  of  the  Failed  States  Index. 

Most  countries  are  positioned  in  conjunction  with  their  original  locations  from  the 
previous  observation  plot.  The  one  notable  exception  is  Afghanistan  which  is  now  far 
removed  from  the  rest  of  the  countries,  due  to  its  unusually  low  “New  Function  2”  score. 


4-33 


Upon  closer  examination,  we  see  from  Figure  4-6  that  the  variable  most  heavily  loaded  in 
this  dimension  is  190,  Child  Mortality  Rate.  Afghanistan  ranked  third  from  the  bottom 
across  all  200  nations  observed,  with  a  rate  of  257  deaths  per  1000  children  under  five 
years  of  age.  The  two  countries  scoring  worse,  Sierra  Leone  and  Angola,  are  not  plotted. 
Afghanistan’s  Child  Mortality  Rate  represents  a  three  standard  deviation  departure  from 
the  mean  of  59  per  1000. 

Appendix  E  provides  a  comparison  between  the  2006  FFP  Index  and  the  scores 
each  nation  received  on  the  Big  Picture  Function  resulting  from  Barnett’s  classification. 

4.5.  Failing  States 

The  preceding  analyses  appear  promising  for  aiding  in  our  goal  of  identifying 
those  states  most  likely  to  be  considered  failing.  Both  FA  and  DA  highlighted  a 
quantifiable  dimension  that  provides  a  documented,  tractable  analysis  of  the  overall  status 
of  nations.  This  dimension  is  characterized  by  the  Big  Picture  factor,  and  well  measured 
by  the  second  discriminant  function  when  Barnett’s  initial  classification  was  used,  and 
the  first  discriminant  function  when  we  used  the  FFP  Failed  States  Index.  States  which 
have  very  low  scores  on  either  of  these  functions  may  be  those  most  in  danger  of  failing 
and  should  be  subjected  to  further  analysis  by  stability  experts.  Table  4-22  is  a  ranked 
listing  of  the  30  nations  scoring  lowest  on  our  final  model,  along  with  their  Prior  and 
Posterior  classifications.  Whether  or  not  these  nations  are  indeed  most  in  need  of 
intervention  is  left  to  experts  in  other  fields.  This  analysis  provides  analytic  support  to 
the  idea  that  there  are  substantial  differences  between  these  and  other  nations  of  the 
world,  and  a  way  to  quantify  those  differences.  It  offers  a  quantitative  method  to  screen 
states  for  further  analysis  based  on  the  collection  of  a  parsimonious  set  of  indicators. 
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Table  4-24:  States  with  Lowest  Function  2  Scores 


Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

FI 

F2 

Burundi 

Gap 

Gap 

0.001 

0.999 

0.000 

-2.322 

-3.356 

Congo,  DRC 

Gap 

Gap 

0.001 

0.997 

0.002 

-0.681 

-3.147 

Sierra  Leone 

Gap 

Gap 

0.001 

0.999 

0.000 

-3.913 

-3.120 

Haiti 

Gap 

Gap 

0.001 

0.997 

0.002 

-0.743 

-3.017 

Equatorial  Guinea 

Gap 

Gap 

0.001 

0.999 

0.000 

-3.121 

-2.980 

Rwanda 

Gap 

Gap 

0.001 

0.999 

0.000 

-2.783 

-2.952 

Eritrea 

Gap 

Gap 

0.001 

0.998 

0.001 

-1.169 

-2.904 

Somalia 

Gap 

Gap 

0.002 

0.998 

0.000 

-2.059 

-2.882 

Zimbabwe 

Gap 

Gap 

0.001 

0.776 

0.223 

1.046 

-2.860 

Guinea-Bissau 

Gap 

Gap 

0.002 

0.998 

0.000 

-3.339 

-2.857 

Chad 

Gap 

Gap 

0.002 

0.990 

0.008 

-0.324 

-2.767 

Togo 

Gap 

Gap 

0.002 

0.996 

0.003 

-0.741 

-2.753 

Angola 

Gap 

Gap 

0.002 

0.998 

0.000 

-1.790 

-2.748 

Comoros 

Gap 

Gap 

0.003 

0.997 

0.000 

-2.676 

-2.661 

Central  African  Republic 

Gap 

Gap 

0.003 

0.996 

0.000 

-1.447 

-2.565 

Afghanistan 

Gap 

Gap 

0.002 

0.762 

0.236 

0.940 

-2.560 

Malawi 

Gap 

Gap 

0.003 

0.987 

0.010 

-0.348 

-2.535 

Korea,  North 

Gap 

Gap 

0.003 

0.992 

0.005 

-0.604 

-2.529 

Cameroon 

Gap 

Gap 

0.003 

0.992 

0.005 

-0.575 

-2.526 

Tajikistan 

Gap 

Gap 

0.004 

0.996 

0.000 

-2.166 

-2.472 

Niger 

Gap 

Gap 

0.004 

0.996 

0.000 

-1.585 

-2.468 

Ethiopia 

Gap 

Gap 

0.004 

0.994 

0.002 

-1.049 

-2.418 

Yemen 

Gap 

Gap 

0.003 

0.758 

0.239 

0.884 

-2.418 

Liberia 

Gap 

Gap 

0.005 

0.995 

0.000 

-2.816 

-2.396 

Guinea 

Gap 

Gap 

0.005 

0.995 

0.000 

-1.755 

-2.393 

Uzbekistan 

Gap 

Gap 

0.005 

0.979 

0.017 

-0.237 

-2.330 

Nepal 

Gap 

Gap 

0.003 

0.705 

0.292 

0.944 

-2.330 

Sudan 

Gap 

Rim 

0.001 

0.330 

0.669 

1.525 

-2.314 

Congo 

Gap 

Gap 

0.006 

0.991 

0.003 

-0.854 

-2.271 

Iraq 

Gap 

Gap 

0.006 

0.985 

0.010 

-0.461 

-2.264 

4.6.  Chapter  Summary 

This  chapter  outlines  the  results  of  the  various  analyses  performed  in  this  study. 
The  combined  results  of  the  Factor  Analysis  and  Discriminant  Analysis  provide  several 
key  insights  to  assist  in  identifying  failing  or  failed  states. 

There  are  in  fact  statistically  significant  differences  between  countries  which  have 
been  previously  classified  as  Core,  Gap,  or  Rim  across  a  wide  range  of  available  data. 
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Furthermore,  the  discriminant  function  which  separates  the  Gap  and  Core  countries 
appears  useful  for  our  purpose  of  distinguishing  failing  and  borderline  states  from  more 
stable  nations,  and  provides  a  scale  on  which  to  measure  their  instability.  Furthermore, 
the  differences  between  states  are  detectable  and  measurable  using  as  few  as  ten  carefully 
chosen  variables,  which  are  currently  being  collected  by  various  agencies  and  available 
open  source.  One  such  set  of  variables  is  as  follows: 

Value  of  Imports 

Percentage  of  Population  Undernourished 
Amount  of  Foreign  Aid 
Political  Terror  Rating 
Children  Under  Five  Mortality  Rate 
Land  Area 

Youth  Bulge  -  Percentage  of  People  Aged  18-22 
Tuberculosis  Death  Rate 

Percentage  of  Women  Comprising  the  Workplace 
Political  Rights 

Using  these  variables,  with  appropriate  transfonnations,  states  may  be  categorized  in 
tenns  of  their  overall  status  by  multiplying  a  country’s  value  on  each  of  these  variables 
by  the  classification  matrix  shown  in  Table  4-23,  and  assigning  it  to  the  group  receiving 
the  highest  score. 
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Table  4-25:  Discriminant  Functions 


Core 

Rim 

Gap 

Intercept 

-319.417 

-329.287 

-309.575 

Log(-A243) 

19.071 

19.242 

18.550 

Log(A1 93) 

12.654 

12.497 

13.871 

Log(A126) 

6.598 

5.610 

6.234 

A120 

-6.750 

-5.854 

-6.250 

A190 

0.155 

0.125 

0.165 

Log(A1 55) 

-1.727 

-1.300 

-1.823 

A257 

1684.570 

1754.207 

1688.771 

Log(A215) 

-6.903 

-5.709 

-6.951 

A225 

1.005 

0.861 

0.984 

A122 

1.392 

1.160 

1.782 

In  addition,  multiplying  by  the  appropriate  canonical  function  shown  in  Table  4- 
24,  we  obtain  a  Big  Picture  score  indicating  the  likelihood  a  country  may  be  in  crisis. 


Table  4-26:  Canonical  Discriminant  Functions 


Variable 

Big  Picture 
Function 

Intercept 

-4.498 

Imports 

0.234 

Pop.  Undernourished 

-0.544 

Foreign  Aid 

0.146 

Political  Terror 

-0.208 

Child  Mortality  (Per  1000) 

-0.005 

Land  Area 

0.049 

Pop  %  Aged  18-22 

-0.791 

Tuberculosis  Deaths 

0.040 

Women  Share  of  Workplace 

0.007 

Political  Rights 

-0.177 

Countries  scoring  lowest  are  those  we  may  consider  most  in  danger  of  failing,  and 
consequently  becoming  attractive  to  terrorist  groups.  This  result  is  validated  by  the 
demonstrated  ability  to  classify  states  on  the  crisis  scale  proposed  by  the  Fund  for  Peace 
using  these  same  variables.  Furthermore,  this  list  of  key  indicators  suggests  areas  which 
may  serve  as  focal  points  for  international  assistance.  Finally,  the  methodology  and 
analysis  can  be  repeated  using  any  available  data  or  for  any  official  classification  of  states 
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to  provide  discrimination  and  screening  of  potential  troubled  areas  for  further  study,  aid, 
support,  or  intervention. 
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5.  Conclusions  and  Recommendations 


5.1.  Introduction 

This  chapter  concludes  this  study  with  a  summary  of  the  significant  research 
contributions  made  through  this  effort,  and  some  areas  for  future  research. 

5.2.  Research  Contributions 

This  thesis  provides  analytic  support  to  the  classification  of  states  as  Stable, 
Borderline,  or  Failing.  It  provides  a  list  of  key  variables,  available  open  source,  which 
can  be  used  to  determine  the  crisis  level  of  a  nation,  and  a  function  for  calculating  this 
level.  Both  Factor  Analysis  and  Discriminant  Analysis  are  useful  for  exposing  the  true 
structure  of  the  myriad  of  data  available  on  states.  DA  has  also  been  shown  to  be  a 
valuable  tool  for  identifying  which  states  are  more  likely  to  require  assistance.  The  data 
collection  and  analysis  accomplished  in  this  thesis  lay  the  groundwork  for  identifying  key 
areas  of  future  concern  for  the  US  in  the  continuing  hunt  for  terrorist  cells. 

5.3.  Recommendations  for  Future  Research 

The  following  sections  suggest  some  areas  where  additional  gains  could  be  made 
through  future  analysis.  By  no  means  does  this  represent  an  exhaustive  list.  The  reader 
is  encouraged  to  consider  the  possible  applications  of  the  methods  discussed  in  this 
thesis,  as  well  as  additional  techniques  for  predicting  failing  states. 

5.3.1.  Time-Series  Analysis 

Clearly,  the  true  importance  of  this  work  will  be  realized  if  we  can  extend  it  to  the 
prediction  of  failing  states.  To  this  point  we  have  determined  the  variables  or  indicators 
most  useful  for  classifying  states  as  failing,  marginal,  or  stable,  and  identified  nations 
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which  have  reached  crisis  stage.  The  next  step  is  to  examine  the  history  of  these  states. 
Knowing  what  measures  are  most  important,  we  can  assess  these  indicators  throughout 
the  critical  period  when  the  nations  fell.  We  can  then  use  this  information  to  identify 
states  which  are  currently  following  the  same  negative  trends.  This  has  the  potential  to 
provide  the  international  community  several  years  advance  notice  in  which  to  plan  and 
carry  out  some  form  of  intervention  to  prevent  crises  and  the  spread  of  terrorism.  This 
should  be  the  first  priority  of  any  follow-on  effort. 

5.3.2.  Suggested  Applications  of  Analysis  Techniques 

The  techniques  used  to  analyze  nations  in  this  thesis  could  easily  be  extended  to 
myriad  other  groups.  For  example,  we  may  be  interested  in  our  own  country  to  identify 
states,  cities,  or  areas  within  cities  where  we  could  expect  economic  depression  or  an 
outbreak  in  crime.  The  conditions  leading  to  such  circumstances  are  similar  to  those  of 
failing  states.  On  the  international  level,  it  is  in  our  interests  to  closely  follow  the  actions 
of  transnational  groups,  which  may  or  may  not  have  terrorist  tendencies.  Throughout  this 
study,  much  infonnation  on  nations  of  the  world  was  available,  but  very  little  open  source 
data  seems  to  be  collected  on  non-nation  groups.  If  possible,  we  may  want  to  collect  the 
key  information  identified  in  this  study  on  these  entities  as  well. 

The  US  is  also  currently  concerned  with  is  reconstruction  and  stabilization  of 
other  nations.  In  Iraq,  government  agencies  are  struggling  with  defining  clear  objectives 
for  the  stabilization  efforts,  and  measuring  progress  towards  those  objectives.  As  this 
thesis  highlights  several  of  the  key  indicators  of  state  strength,  there  is  an  opportunity  to 
contribute  to  the  setting  of  goals  and  assessing  the  progress  made  in  achieving  them. 
Planners  could  use  such  analysis  to  prioritize  reconstruction  activities  within  countries. 
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For  example,  recall  that  the  geographic  size  of  a  nation  is  one  of  the  key  indicators  of 
stability  found  in  this  study.  This  might  be  due  to  the  relationship  between  size  and 
available  economic  or  natural  resources.  This  suggests  partitioning  Iraq  into  three 
smaller  nations  may  be  less  desirable  from  a  stability  standpoint,  though  other  factors 
such  as  ethnic  fractionalization  may  suggest  otherwise.  In  addition,  the  significance  of 
Political  Terror  and  Political  Rights  indicate  that  true  stability  in  Iraq  is  more  likely  under 
a  free  and  democratic  government. 

The  principal  factors  that  describe  the  status  of  a  group  are  not  exclusive  to 
nations.  In  fact,  the  indicators  considered  in  this  thesis  dealing  with  financial  status, 
unemployment,  indebtedness,  health,  freedom,  a  sense  of  justice,  perception  of 
opportunity,  and  so  forth  are  the  key  variables  we  could  use  to  assess  any  population, 
down  to  individuals.  They  constitute  the  needs  of  all  people  and  could  therefore  provide 
insight  into  aiding  the  homeless,  gangs,  struggling  children,  employees  or  any  other 
group  of  people  we  are  interested  in. 

5.3.3.  Alternative  Missing  Data  Techniques 

In  studies  involving  time-series  data,  it  would  be  beneficial  to  explore  methods 
for  imputing  data  across  two  or  more  dimensions.  In  this  study,  when  data  were  missing 
on  a  variable  for  a  given  year,  the  missing  values  were  drawn  from  other  similar  countries 
within  the  year  of  interest.  However,  we  may  achieve  more  accurate  values  in  some 
cases  if  we  were  to  draw  from  other  years  in  which  data  were  available  for  the  incomplete 
country.  Drawing  exclusively  from  either  an  individual  country’s  populated  years,  or 
from  other  countries  will  not  necessarily  always  produce  optimal  results.  If  data  are 
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missing  for  a  single  year,  interpolation  may  provide  a  more  reasonable  estimate  than 
using  data  from  another  country,  similar  as  that  country  may  be.  However,  if  data  were 
rarely  collected  for  a  country,  or  only  collected  outside  the  time  period  of  interest, 
extrapolation  may  produce  less  accurate  results  than  using  data  from  other  countries. 

This  may  be  particularly  true  if  those  countries  have  similar  values  on  many  other 
variables.  Note  also  that  imputing  over  time  is  only  possible  if  data  has  been  collected  at 
least  once  on  the  country,  which  was  not  always  the  case  in  this  study.  Throughout  our 
literature  review,  we  discovered  no  such  multi-dimensional  imputation  methods. 

5.3.3. 1.  Multiple  Imputation 

Recall  from  Chapter  2  that  Multiple  Regression  Imputation  calculates  estimates  of 
missing  values  by  using  a  regression  equation  built  from  the  non-missing  data.  This 
approach  to  estimating  missing  values  may  be  desirable  in  the  sense  that  it  uses  all 
available  data  and  relationships  among  variables  to  impute  missing  values,  and  two 
analysts  working  from  the  same  incomplete  dataset  will  generate  the  same  imputed 
dataset  for  use  in  future  analysis.  However,  as  the  imputed  values  are  a  linear  function  of 
the  observed  variables,  the  correlation  among  variables  will  necessarily  be  overstated 
(Allison,  2001 :  29).  Moreover,  the  deterministic  nature  of  the  procedure  implies  that  it 
does  not  account  for  the  uncertainty  due  to  the  missing  data,  thus  variances  and 
covariances  may  be  underestimated  (Allison,  2001 :  28).  Multiple  Imputation  (MI)  is  an 
imputation  procedure  which  builds  on  this  concept  by  including  a  variation  factor  along 
with  the  predicted  value  for  each  imputation.  Each  time  a  new  value  is  imputed,  it  will 
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include  the  variation  inherent  in  the  variable.  This  will  allow  us  to  create  multiple 
complete  datasets  with  variation  in  the  imputed  values  for  further  analysis. 

In  general,  we  are  not  interested  in  multiple  results  for  a  single  study,  so  the  next 
step  is  to  analyze  each  of  the  imputed  datasets  for  the  parameters  of  interest,  and  then 
combine  the  results  into  a  single  point  estimate.  Chantala  and  Suchindran  offer  a  simple 
calculation  for  combining  the  results  of  multiple  imputations  (Chantala  and  Suchindran, 
2006).  In  reality  however,  we  can  not  know  the  true  values  of  the  population  parameters. 
To  account  for  this  uncertainty,  we  should  draw  the  values  of  the  randomly  from  their 
Bayesian  posterior  distributions  (Allison,  2001 :  31).  One  method  for  estimating  the 
posterior  distributions  of  the  parameters  is  the  Data  Augmentation  Algorithm. 


5.3.3.2.  Data  Augmentation 

Data  augmentation  is  an  iterative  algorithm  for  finding  posterior  distributions 
(Allison,  2001 :  34).  Allison,  2001  describes  the  algorithm  as  consisting  of  the  following 
steps  (Allison,  2001:  35). 

0.  Choose  the  variables  for  use  in  the  imputation  process.  In  addition  to  the 

variables  for  which  we  wish  to  impute  data,  other  variables  may  be  included  if 
they  are  known  to  be  highly  correlated  with,  or  have  similar  missing  data 
patterns  to,  the  variables  of  interest.  Also,  while  MI  has  been  shown  to  be 
robust  to  non-normally  distributed  data,  the  algorithm  tends  to  converge  faster 
for  the  multivariate  normal  model.  Therefore,  if  possible,  transfonn  variables 
so  that  they  at  least  approximately  follow  a  normal  distribution. 

1 .  Choose  starting  values  for  the  parameters.  For  the  multivariate  normal  model, 
the  parameters  are  the  means  and  covariance  matrix.  Starting  values  can  be 
gotten  from  the  standard  formulas  using  list- wise  or  pair-wise  deletion. 

2.  Use  the  current  values  of  the  means  and  covariances  to  obtain  estimates  of 
regression  coefficients  for  equations  in  which  each  variable  with  missing  data 
is  regressed  on  all  observed  variables.  This  is  done  for  each  pattern  of 
missing  data. 
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3.  Use  the  regression  estimates  to  generate  predicted  values  for  all  the  missing 
values.  To  each  predicted  value,  ass  a  random  draw  from  the  residual 
distribution  for  that  variable. 

4.  Using  the  “completed”  dataset,  with  both  observed  and  imputed  values, 
recalculate  the  means  and  covariance  matrix  using  standard  formulas. 

5.  Based  on  the  newly  calculated  means  and  covariances,  make  a  random  draw 
from  the  posterior  distribution  of  the  means  and  covariances. 

6.  Using  the  randomly  drawn  means  and  covariances,  go  back  to  Step  2  and 
continue  cycling  through  the  subsequent  steps  until  convergence  is  achieved. 
The  imputations  that  are  produced  during  the  final  iteration  are  used  to  fonn  a 
completed  dataset. 

Multiple  Imputation  with  Data  Augmentation  is  a  technique  which  is  gaining 
popularity  as  automated  software  is  developed  and  tested.  Analysts  wishing  to  use  our 
dataset  for  future  studies  may  benefit  from  further  exploration  into  this  or  other  missing 
data  techniques. 


5.3.4.  Alternative  Discriminant  Analysis  Techniques 

As  shown  in  Chapter  4,  there  are  several  approaches  to  building  a  Discriminant 
Function.  A  quick  look  at  the  results  obtained  using  Mahalanobis’  Method  as  opposed  to 
Fisher’s  suggests  that  greater  classification  accuracy  may  be  achieved  with  a  similar 
reduced  set  of  variables  by  exploring  other  DA  methods. 

5.3.5.  Cluster  Analysis 

Cluster  Analysis  (CA)  is  a  technique  used  to  partition  a  set  of  subjects  into  two  or 
more  disjoint  groups  (Lattin  et  al,  2003:  264-5,  Dillon  et  al,  1984:  157-8).  It  does  this  by 
using  information  captured  in  a  set  of  independent  variables  to  create  the  clearest  possible 
separation  among  the  subjects,  and  assigning  them  to  their  most  likely  group  (Lattin  et  al, 
2003:  265).  CA  compares  the  within-group  variation  to  the  between  group  variation, 
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reassigning  members  until  the  former  is  as  small  as  possible  in  relation  to  the  latter 
(Dillon  etal,  1984:  160). 

Recall  from  the  discussion  of  Factor  Analysis  section  that  one  of  its  primary 
objectives  is  to  reduce  the  dimensionality  of  a  dataset.  FA  does  this  by  grouping 
variables  which  seem  to  reflect  an  underlying,  latent  factor.  Similarly,  CA  may  also  be 
thought  of  as  a  data  reduction  technique.  However,  rather  than  grouping  variables 
(columns)  of  a  data  matrix,  the  number  of  distinct  observations  (rows)  is  reduced  to  a 
smaller  number  of  observation  clusters  (Dillon  et  al,  1984:  161).  If  CA  could  be  used  to 
effectively  categorize  the  200+  nations  of  the  world  into  three  or  four  distinct  classes, 
analysts  would  then  be  left  only  to  decide  which  category  appears  to  contain  the  majority 
of  critical  states.  Members  of  this  cluster  would  then  be  candidates  for  being  considered 
failing.  Clearly,  other  multivariate  and  operations  research  techniques  can  be  applied  to 
improve  our  ability  to  aid  subject  matter  experts  and  decision  makers  in  the  analysis  and 
classification  of  failing  states. 

5.4.  Conclusions 

Due  to  the  unwavering  commitment  of  terrorist  organizations,  it  appears  the 
Global  War  on  Terrorism  will  not  end  until  we  are  able  to  disrupt  their  activities  and 
preclude  them  from  recruiting  additional  personnel.  This  can  best  be  accomplished  by 
taking  a  proactive  approach  in  areas  most  likely  to  provide  safe  havens  for  terrorist 
groups  looking  for  asylum.  This  thesis  provides  a  foundation  for  such  efforts  by  first 
identifying  the  key  indicators  of  state  failure  through  Factor  Analysis  and  Discriminant 
Analysis.  DA  is  then  further  used  to  determine  the  likelihood  that  a  state  will  experience 
some  form  of  crisis  by  constructing  a  discriminant  function  based  on  the  appropriate 
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variables.  A  list  of  ten  variables  and  the  appropriate  classification  functions  based  on 
these  indicators  are  provided  along  with  suggestions  for  their  utilization  in  future  studies. 
It  is  our  hope  that  this  will  enable  the  international  community  to  predict  likely  trouble 
spots  and  employ  specific,  targeted  economic  or  political  measures  to  prevent  crises  and 
thwart  the  spread  of  terrorism.  Doing  so  will  save  time,  money,  and  most  importantly 
lives  by  addressing  the  issues  likely  to  lead  to  costly  violent  conflict. 
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Appendix  A:  Initial  Variables  Considered 


This  appendix  provides  a  list  of  all  variables  collected  for  the  initial  dataset  used 
in  this  thesis,  as  well  as  the  sources  of  the  data.  Series  names  are  annotated  with 
superscripts  corresponding  to  one  of  the  following  sources: 


1 .  CSCW  -  Centre  for  the  Study  of  Civil  War 

2.  EM-DAT:  OFDA/CRED  International  Disaster  Database:  Centre  for  Research  on 
the  Epidemiology  of  Disasters 

3.  Freedom  House 

4.  Dr.  Mark  Gibney,  University  of  North  Carolina,  Center  for  International  Studies 

5.  Sean  O’Brien  -  Center  for  Anny  Analysis 

6.  Polity  IV  Database  -  Center  for  Global  Policy  at  George  Mason  University 

7.  RAND 

8.  UN  Millennium  Development  Group 

9.  UN  Office  on  Drugs  and  Crime 

10.  UN  Common  Database 

1 1 .  UN  Population  Division,  World  Population  Prospects,  2003 

12.  UN  High  Committee  on  Refugees 

13.  UN  Statistics  Division 

14.  UNESCO 

15.  World  Bank 


Code 

Data 

Availability 

Series 

100 

98% 

Population  11 

101 

95% 

School  age  population.  Primary.  Total  14 

102 

95% 

School  age  population.  Primary.  Male  14 

103 

95% 

School  age  population.  Secondary.  Total  14 

104 

95% 

School  age  population.  Secondary.  Male  14 

105 

95% 

School  age  population.  Tertiary.  Total  14 

106 

95% 

School  age  population.  Tertiary.  Male  14 

107 

94% 

Enrolment  in  total  secondary.  Public  and  private.  All  programmes.  Total  14 

108 

94% 

Enrolment  in  primary.  All  grades.  Total  14 

109 

85% 

Pupil-teacher  ratio.  Secondary  14 

110 

77% 

Public  expenditure  on  education  as  %  of  GDP  14 

111 

73% 

Calories  5 

112 

75% 

Youth  Bulge  5 

113 

73% 

Largest  Religion  %  5 

114 

74% 

Largest  Ethnic  Group  %  5 

115 

72% 

Trade  5 

A-l 


Code 

Data 

Availability 

Series 

116 

75% 

%  time  in  conflict  90-2003  5 

117 

100% 

Battle  Deaths  (Zero  when  empty)  1 

118 

73% 

Refugees  12 

119 

98% 

GDP  Per  Capita  13 

120 

82% 

Political  Terror  4 

121 

92% 

Freedom  of  Press  3 

122 

91% 

Political  Rights  3 

123 

91% 

Civil  Liberties  3 

124 

92% 

Agricultural  land  (%  of  land  area)  15 

125 

84% 

Agriculture,  value  added  (%  of  GDP)  15 

126 

81% 

Aid  per  capita  (current  US$)  15 

127 

82% 

Arms  imports  (constant  1990  US$)  15 

128 

84% 

Births  attended  by  skilled  health  staff  (%  of  total)  15 

129 

90% 

C02  emissions  (metric  tons  per  capita)  15 

130 

79% 

Consumer  price  index  (2000  =  100)  15 

131 

69% 

Contraceptive  prevalence  (%  of  women  ages  15-49)  15 

132 

62% 

Electric  power  consumption  (kWh  per  capita)  15 

133 

61% 

Energy  imports,  net  (%  of  energy  use)  15 

134 

66% 

Expenditure  per  student,  primary  (%  of  GDP  per  capita)  5 

135 

87% 

Exports  of  goods  and  services  (%  of  GDP)  15 

136 

77% 

Exports  of  goods  and  services  (annual  %  growth)  15 

137 

73% 

Exports  of  goods  and  services  (constant  2000  US$)  15 

138 

65% 

External  debt,  total  (DOD,  current  US$)  15 

139 

92% 

Fertility  rate,  total  (births  per  woman)  15 

140 

93% 

Fixed  line  and  mobile  phone  subscribers  (per  1,000  people)  15 

141 

81% 

Food  imports  (%  of  merchandise  imports)  15 

142 

90% 

Forest  area  (%  of  land  area)  15 

143 

88% 

GDP  per  capita  (constant  2000  US$)  15 

144 

90% 

GDP  per  capita  growth  (annual  %)  15 

145 

88% 

GNI  per  capita,  Atlas  method  (current  US$)  15 

146 

81% 

GNI  per  capita,  PPP  (current  international  $)  5 

147 

90% 

Flealth  expenditure  per  capita  (current  US$)  15 

148 

71% 

Flospital  beds  (per  1,000  people)  5 

149 

89% 

Flouseholds  with  television  (%)15 

150 

90% 

Immunization,  measles  (%  of  children  ages  12-23  months)  15 

151 

83% 

Improved  water  source  (%  of  population  with  access)  15 

152 

90% 

Inflation,  GDP  deflator  (annual  %)  15 

153 

69% 

International  tourism,  expenditures  (%  of  total  imports)  15 

154 

94% 

Internet  users  (per  1,000  people)  15 

155 

93% 

Land  area  (sq.  km)  15 

156 

92% 

Life  expectancy  at  birth,  total  (years)  15 

157 

55% 

Literacy  rate,  adult  total  (%  of  people  ages  15  and  above)  15 
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Code 

Data 

Availability 

Series 

158 

52% 

Literacy  rate,  youth  total  (%  of  people  ages  15-24)  15 

159 

75% 

Military  expenditure  (%  of  GDP)  15 

160 

81% 

Military  personnel  (%  of  total  labor  force)  15 

161 

93% 

Mobile  phone  subscribers  (per  1,000  people)  15 

162 

89% 

Mortality  rate,  infant  (per  1,000  live  births) 15 

163 

89% 

Mortality  rate,  under-5  (per  1,000)  15 

164 

89% 

Net  migration  15 

165 

81% 

Personal  computers  (per  1,000  people)  15 

166 

92% 

Population  density  (people  per  sq.  km)  15 

167 

95% 

Population  growth  (annual  %)  15 

168 

44% 

Poverty  headcount  ratio  at  $2  a  day  (PPP)  (%  of  population)  15 

169 

70% 

Prevalence  of  HIV,  total  (%  of  population  ages  15-49)  5 

170 

79% 

Primary  completion  rate,  female  (%  of  relevant  age  group)  15 

171 

80% 

Primary  completion  rate,  total  (%  of  relevant  age  group)  15 

172 

89% 

Proportion  of  seats  held  by  women  in  national  parliament  (%)15 

173 

61% 

Public  spending  on  education,  total  (%  of  government  expenditure)  15 

174 

87% 

Pupil-teacher  ratio,  primary  15 

175 

80% 

Ratio  of  female  to  male  enrollments  in  tertiary  education  15 

176 

88% 

Ratio  of  female  to  male  primary  enrollment  15 

177 

87% 

Ratio  of  girls  to  boys  in  primary  and  secondary  education  (%)15 

178 

78% 

Refugee  population  by  country  or  territory  of  asylum  15 

179 

96% 

Rural  population  (%  of  total  population)  15 

180 

93% 

Rural  population  growth  (annual  %)  15 

181 

86% 

Telecommunications  revenue  (%  GDP)  15 

182 

60% 

Total  debt  service  (%  of  exports  of  goods,  services  and  income)  13 

183 

63% 

Unemployment,  male  (%  of  male  labor  force)  15 

184 

65% 

Unemployment,  total  (%  of  total  labor  force)  15 

185 

96% 

Urban  population  (%  of  total)  15 

186 

65% 

Use  of  IMF  credit  (DOD,  current  US$)  15 

187 

85% 

Maternal  mortality  ratio  per  100,000  live  births  8 

188 

89% 

Total  number  of  seats  in  national  parliament 8 

189 

88% 

Seats  held  by  women  in  national  parliament,  percentage  8 

190 

92% 

Children  under  five  mortality  rate  per  1,000  live  births  8 

191 

92% 

Infant  mortality  rate  (0-1  year)  per  1,000  live  births  8 

192 

91% 

Children  1  year  old  immunized  against  measles,  percentage  8 

193 

70% 

Population  undernourished,  percentage  8 

194 

97% 

Land  area  covered  by  forest,  percentage  8 

195 

82% 

Births  attended  by  skilled  health  personnel,  percentage  8 

196 

69% 

AIDS  deaths  8 

197 

44% 

Population  below  $1  (PPP)  per  day  consumption,  percentage  8 

198 

31% 

Population  below  national  poverty  line,  total,  percentage  8 

199 

44% 

Poverty  gap  ratio  8 
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Code 

Data 

Availability 

Series 

200 

78% 

Net  enrolment  ratio  in  primary  education,  both  sexes  8 

201 

60% 

Percentage  of  pupils  starting  grade  1  reaching  grade  5,  both  sexes  8 

202 

52% 

Youth  unemployment  rate,  aged  15-24,  men  8 

203 

95% 

Telephone  lines  and  cellular  subscribers  per  100  population  8 

204 

95% 

Internet  users  per  100  population  8 

205 

80% 

Personal  computers  8 

206 

91% 

Gender  Parity  Index  in  primary  level  enrolment 8 

207 

90% 

Gender  Parity  Index  in  secondary  level  enrolment 8 

208 

80% 

Gender  Parity  Index  in  tertiary  level  enrolment 8 

209 

88% 

Protected  area  to  total  surface  area,  percentage  8 

210 

99% 

Tuberculosis  prevalence  rate  per  100,000  population  8 

211 

89% 

Tuberculosis  treatment  success  rate  under  DOTS,  percentage  8 

212 

53% 

Youth  unemployment  rate,  aged  15-24,  both  sexes  8 

213 

78% 

Net  enrolment  ratio  in  primary  education,  boys  8 

214 

78% 

Net  enrolment  ratio  in  primary  education,  girls  8 

215 

99% 

Tuberculosis  death  rate  per  100,000  population  8 

216 

58% 

Energy  use  (Kg  oil  equivalent)  per  $1,000  (PPP)  GDP  8 

217 

82% 

Consumption  of  ozone-depleting  CFCs  in  ODP  metric  tons  8 

218 

61% 

Debt  service  as  %  of  exports  of  goods  and  services  and  net  income  from  abroad  8 

219 

52% 

Literacy  rates  of  15-24  years  old,  both  sexes,  percentage  8 

220 

88% 

Seats  held  by  men  in  national  parliament 8 

221 

88% 

Seats  held  by  women  in  national  parliament 8 

222 

86% 

Proportion  of  the  population  using  improved  drinking  water  sources,  total 8 

223 

80% 

Proportion  of  the  population  using  improved  sanitation  facilities,  total 8 

224 

49% 

Slum  population  as  percentage  of  urban,  percentage  8 

225 

71% 

Share  of  women  in  wage  employment  in  the  non-agricultural  sector  8 

226 

73% 

People  living  with  HIV,  15-49  years  old,  percentage  8 

227 

39% 

Women  15-24  years  old,  who  know  that  a  healthy-looking  person  can  transmit  HIV, 
percentage  8 

228 

81% 

Primary  completion  rate,  both  sexes  8 

229 

80% 

Primary  completion  rate,  boys  8 

230 

80% 

Primary  completion  rate,  girls  8 

231 

95% 

Carbon  dioxide  emissions  (C02),  metric  tons  of  C02  per  capita  (CDIAC) 8 

232 

82% 

Consumption  of  all  Ozone-Depleting  Substances  in  ODP  metric  tons  8 

233 

50% 

Number  of  Recorded  Crimes  9 

234 

40% 

Number  of  Recorded  Murders  Attempted  9 

235 

49% 

Number  of  Recorded  Drug  Crimes  9 

236 

95% 

Number  of  Disaster  Related  Deaths  (Zero  when  empty) 2 

237 

100% 

Number  of  Terrorist  Attacks  Attempted  and/or  Completed  (Zero  when  empty) 7 

238 

100% 

Number  of  Fatalities  Due  to  Terrorist  Attacks  (Zero  when  empty) 7 

239 

100% 

Agricultural  production  index,  1999-2001=100  10 

240 

91% 

Agricultural  production  per  capita  index,  1999-2001=100  10 
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Code 

Data 

Availability 

Series 

241 

77% 

AIDS/HIV  adult  infections  prevalence,  %  (UNAIDS  estimates)  10 

242 

80% 

Balance  of  Payments:  exports  of  goods,  free  on  board,  US$  (IMF)  10 

243 

80% 

Balance  of  Payments:  imports  of  goods,  free  on  board,  US$  (IMF)  10 

244 

80% 

Balance  of  Payments:  trade  balance,  goods  and  services,  US$  (IMF)  10 

245 

88% 

Death  rate,  crude  per  1,000  10 

246 

83% 

Exchange  rate,  US$  per  national  currency  (IMF)  10 

247 

98% 

GDP  annual  growth  rate,  1990  prices,  US$  10 

248 

74% 

Imports  of  goods  and  services,  current  prices  10 

249 

88% 

Infant  mortality  rate  per  1,000  live  births  10 

250 

88% 

Migration,  international  net  rate  per  year  10 

251 

94% 

Telephone  lines  and  cellular  subscribers  per  100  population  10 

252 

100% 

Data  Availability 

253 

95% 

School  age  population.  Primary.  Total  % 

254 

95% 

School  age  population.  Primary.  Male  % 

255 

95% 

School  age  population.  Secondary.  Total  % 

256 

95% 

School  age  population.  Secondary.  Male  % 

257 

95% 

School  age  population.  Tertiary.  Total  % 

258 

95% 

School  age  population.  Tertiary.  Male  % 

259 

93% 

Enrolment  in  total  secondary.  Public  and  private.  All  programs.  Total  % 

260 

93% 

Enrolment  in  primary.  All  grades.  Total  % 

261 

68% 

AIDS  deaths  Per  1000  Pop 

262 

50% 

Number  of  Recorded  Crimes  Per  1000  Pop 

263 

40% 

Number  of  Recorded  Murders  Attempted  Per  1000  Pop 

264 

48% 

Number  of  Recorded  Drug  Crimes  Per  1000  Pop 

265 

76% 

Autocracy-Democracy  Scale  (-10  to  10) 6 

266 

77% 

Government  Stability  -  Years  since  last  government  change  6 
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Appendix  B:  Reduced  List  of  60  Variables 


Appendix  B  provides  a  list  of  the  variables  retained  after  completing  the 
correlation  analysis. 


Retained 

Transformation 

Series 

too 

In 

Population 

110 

none 

Public  expenditure  on  education  as  %  of  GDP 

113 

none 

Largest  Religion  % 

114 

none 

Largest  Ethnic  Group  % 

116 

none 

%  time  in  conflict  90-2003 

118 

In 

Refugees 

119 

In 

GDP  Per  Capita 

120 

none 

Political  Terror 

122 

none 

Political  Rights 

124 

none 

Agricultural  land  (%  of  land  area) 

126 

In 

Aid  per  capita  (current  US$) 

130 

In 

Consumer  price  index  (2000  =  100) 

133 

In 

Energy  imports,  net  (%  of  energy  use) 

135 

none 

Exports  of  goods  and  services  (%  of  GDP) 

136 

In 

Exports  of  goods  and  services  (annual  %  growth) 

141 

none 

Food  imports  (%  of  merchandise  imports) 

144 

In 

GDP  per  capita  growth  (annual  %) 

152 

In 

Inflation,  GDP  deflator  (annual  %) 

153 

In 

International  tourism,  expenditures  (%  of  total  imports) 

155 

In 

Land  area  (sq.  km) 

159 

In 

Military  expenditure  (%  of  GDP) 

160 

In 

Military  personnel  (%  of  total  labor  force) 

166 

In 

Population  density  (people  per  sq.  km) 

167 

In 

Population  growth  (annual  %) 

172 

In 

Proportion  of  seats  held  by  women  in  national  parliament  (%) 

174 

In 

Pupil-teacher  ratio,  primary 

175 

none 

Ratio  of  female  to  male  enrollments  in  tertiary  education 

177 

none 

Ratio  of  girls  to  boys  in  primary  and  secondary  education  (%) 

180 

In 

Rural  population  growth  (annual  %) 

181 

In 

Telecommunications  revenue  (%  GDP) 

182 

In 

Total  debt  service  (%  of  exports  of  goods,  services  and  income) 

184 

In 

Unemployment,  total  (%  of  total  labor  force) 

185 

none 

Urban  population  (%  of  total) 

186 

none 

Use  of  IMF  credit  (DOD,  current  US$) 

190 

none 

Children  under  five  mortality  rate  per  1,000  live  births 

192 

none 

Children  1  year  old  immunized  against  measles,  percentage 

193 

In 

Population  undernourished,  percentage 
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Retained 

Transformation 

Series 

209 

In 

Protected  area  to  total  surface  area,  percentage 

211 

none 

Tuberculosis  treatment  success  rate  under  DOTS,  percentage 

215 

In 

Tuberculosis  death  rate  per  100,000  population 

216 

In 

Energy  use  (Kg  oil  equivalent)  per  $1,000  (PPP)  GDP 

221 

In 

Seats  held  by  women  in  national  parliament 

225 

none 

Share  of  women  in  wage  employment  in  the  non-agricultural  sector 

231 

In 

Carbon  dioxide  emissions  (C02),  metric  tons  of  C02  per  capita  (CDIAC) 

236 

none 

Number  of  Disaster  Related  Deaths  (Zero  when  empty) 

239 

none 

Agricultural  production  index,  1999-2001=100 

243 

ln(-) 

Balance  of  Payments:  imports  of  goods,  free  on  board,  US$  (IMF) 

244 

none 

Balance  of  Payments:  trade  balance,  goods  and  services,  US$  (IMF) 

246 

In 

Exchange  rate,  US$  per  national  currency  (IMF) 

247 

none 

GDP  annual  growth  rate,  1990  prices,  US$ 

248 

In 

Imports  of  goods  and  services,  current  prices 

250 

none 

Migration,  international  net  rate  per  year 

252 

none 

Count  of  entries 

253 

none 

School  age  population.  Primary.  Total  % 

257 

none 

School  age  population.  Tertiary.  Total  % 

259 

none 

Enrolment  in  total  secondary.  Public  and  private.  All  programmes.  Total  % 

262 

In 

Number  of  Recorded  Crimes  Per  1000  Pop 

263 

In 

Number  of  Recorded  Murders  Attempted  Per  1000  Pop 

264 

In 

Number  of  Recorded  Ding  Crimes  Per  1000  Pop 

266 

In 

Government  Stability  -  Years  since  last  government  change 
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Appendix  C:  Nations  Analyzed 


Afghanistan 

Croatia 

Jordan 

Albania 

Cuba 

Kazakhstan 

Algeria 

Cyprus 

Kenya 

Andorra 

Czech  Republic 

Kiribati 

Angola 

Denmark 

Korea,  North 

Anguilla 

Djibouti 

Korea,  South 

Antigua  and  Barbuda 

Dominica 

Kuwait 

Argentina 

Dominican  Republic 

Kyrgyzstan 

Armenia 

East  Timor 

Laos 

Aruba 

Ecuador 

Latvia 

Australia 

Egypt 

Lebanon 

Austria 

El  Salvador 

Lesotho 

Azerbaijan 

Equatorial  Guinea 

Liberia 

Bahamas 

Eritrea 

Libya 

Bahrain 

Estonia 

Liechtenstein 

Bangladesh 

Ethiopia 

Lithuania 

Barbados 

Fiji 

Luxembourg 

Belarus 

Finland 

Macau 

Belgium 

France 

Macedonia 

Belize 

French  Polynesia 

Madagascar 

Benin 

Gabon 

Malawi 

Bermuda 

Gambia,  The 

Malaysia 

Bhutan 

Gaza  Strip 

Maldives 

Bolivia 

Georgia 

Mali 

Bosnia  and  Herzegovina 

Germany 

Malta 

Botswana 

Ghana 

Marshall  Islands 

Brazil 

Greece 

Mauritania 

Brunei 

Grenada 

Mauritius 

Bulgaria 

Guam 

Mexico 

Burkina  Faso 

Guatemala 

Micronesia 

Burma 

Guinea 

Moldova 

Burundi 

Guinea-Bissau 

Mongolia 

Cambodia 

Guyana 

Morocco 

Cameroon 

Haiti 

Mozambique 

Canada 

Honduras 

Namibia 

Cape  Verde 

Hong  Kong 

Nepal 

Cayman  Islands 

Hungary 

Netherlands 

Central  African  Republic 

Iceland 

Netherlands  Antilles 

Chad 

India 

New  Caledonia 

Chile 

Indonesia 

New  Zealand 

China 

Iran 

Nicaragua 

Colombia 

Iraq 

Niger 

Comoros 

Ireland 

Nigeria 

Congo 

Israel 

N  orway 

Congo,  DRC 

Italy 

Oman 

Costa  Rica 

Jamaica 

Pakistan 

Cote  d'Ivoire 

Japan 

Palau 
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Panama 

Serbia 

Togo 

Papua  New  Guinea 

Seychelles 

Tonga 

Paraguay 

Sierra  Leone 

Trinidad  and  Tobago 

Peru 

Singapore 

T  unisia 

Philippines 

Slovakia 

Turkey 

Poland 

Slovenia 

Turkmenistan 

Portugal 

Solomon  Islands 

Uganda 

Puerto  Rico 

Somalia 

Ukraine 

Qatar 

South  Africa 

United  Arab  Emirates 

Romania 

Spain 

United  Kingdom 

Russia 

Sri  Lanka 

United  States 

Rwanda 

Sudan 

Uruguay 

St  Kitts  and  Nevis 

Suriname 

Uzbekistan 

St  Lucia 

Swaziland 

Vanuatu 

St  Vincent  and  the  Grenadines 

Sweden 

Venezuela 

Samoa 

Switzerland 

Vietnam 

San  Marino 

Syria 

Yemen 

Sao  Tome  and  Principe 

Tajikistan 

Zambia 

Saudi  Arabia 

Tanzania 

Zimbabwe 

Senegal 

Thailand 
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Appendix  D:  States  in  Crisis  -  Model  Output 


This  table  provides  a  ranked  listing  of  states  in  order  of  their  second  discriminant 
function  scores,  also  known  as  the  Big  Picture.  Shown  are  the  prior  and  posterior 
classifications,  the  calculated  probability  of  belonging  to  each  group,  and  both 
discriminant  function  scores.  Countries  listed  first  are  most  likely  to  experience  crises, 
based  on  the  finding  in  this  thesis. 


Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

FI 

F2 

Burundi 

Gap 

Gap 

0.001 

0.999 

0.000 

-2.322 

-3.356 

Congo,  DRC 

Gap 

Gap 

0.001 

0.997 

0.002 

-0.681 

-3.147 

Sierra  Leone 

Gap 

Gap 

0.001 

0.999 

0.000 

-3.913 

-3.120 

Haiti 

Gap 

Gap 

0.001 

0.997 

0.002 

-0.743 

-3.017 

Equatorial  Guinea 

Gap 

Gap 

0.001 

0.999 

0.000 

-3.121 

-2.980 

Rwanda 

Gap 

Gap 

0.001 

0.999 

0.000 

-2.783 

-2.952 

Eritrea 

Gap 

Gap 

0.001 

0.998 

0.001 

-1.169 

-2.904 

Somalia 

Gap 

Gap 

0.002 

0.998 

0.000 

-2.059 

-2.882 

Zimbabwe 

Gap 

Gap 

0.001 

0.776 

0.223 

1.046 

-2.860 

Guinea-Bissau 

Gap 

Gap 

0.002 

0.998 

0.000 

-3.339 

-2.857 

Chad 

Gap 

Gap 

0.002 

0.990 

0.008 

-0.324 

-2.767 

Togo 

Gap 

Gap 

0.002 

0.996 

0.003 

-0.741 

-2.753 

Angola 

Gap 

Gap 

0.002 

0.998 

0.000 

-1.790 

-2.748 

Comoros 

Gap 

Gap 

0.003 

0.997 

0.000 

-2.676 

-2.661 

Central  African 
Republic 

Gap 

Gap 

0.003 

0.996 

0.000 

-1.447 

-2.565 

Afghanistan 

Gap 

Gap 

0.002 

0.762 

0.236 

0.940 

-2.560 

Malawi 

Gap 

Gap 

0.003 

0.987 

0.010 

-0.348 

-2.535 

Korea,  North 

Gap 

Gap 

0.003 

0.992 

0.005 

-0.604 

-2.529 

Cameroon 

Gap 

Gap 

0.003 

0.992 

0.005 

-0.575 

-2.526 

Tajikistan 

Gap 

Gap 

0.004 

0.996 

0.000 

-2.166 

-2.472 

Niger 

Gap 

Gap 

0.004 

0.996 

0.000 

-1.585 

-2.468 

Ethiopia 

Gap 

Gap 

0.004 

0.994 

0.002 

-1.049 

-2.418 

Yemen 

Gap 

Gap 

0.003 

0.758 

0.239 

0.884 

-2.418 

Liberia 

Gap 

Gap 

0.005 

0.995 

0.000 

-2.816 

-2.396 

Guinea 

Gap 

Gap 

0.005 

0.995 

0.000 

-1.755 

-2.393 

Uzbekistan 

Gap 

Gap 

0.005 

0.979 

0.017 

-0.237 

-2.330 

Nepal 

Gap 

Gap 

0.003 

0.705 

0.292 

0.944 

-2.330 

Sudan 

Gap 

Rim 

0.001 

0.330 

0.669 

1.525 

-2.314 

Congo 

Gap 

Gap 

0.006 

0.991 

0.003 

-0.854 

-2.271 

Iraq 

Gap 

Gap 

0.006 

0.985 

0.010 

-0.461 

-2.264 

Laos 

Gap 

Gap 

0.007 

0.992 

0.002 

-1.126 

-2.212 

Gambia,  The 

Gap 

Gap 

0.009 

0.991 

0.000 

-3.262 

-2.197 
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Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

FI 

F2 

Cote  d'Ivoire 

Gap 

Gap 

0.007 

0.989 

0.003 

-0.902 

-2.163 

Djibouti 

Gap 

Gap 

0.008 

0.991 

0.000 

-1.826 

-2.149 

Burkina  Faso 

Gap 

Gap 

0.008 

0.991 

0.001 

-1.225 

-2.147 

Zambia 

Gap 

Gap 

0.008 

0.991 

0.000 

-1.649 

-2.142 

Uganda 

Gap 

Gap 

0.009 

0.987 

0.004 

-0.849 

-2.093 

Cambodia 

Gap 

Gap 

0.011 

0.987 

0.002 

-1.280 

-1.989 

Swaziland 

Gap 

Gap 

0.011 

0.981 

0.008 

-0.691 

-1.968 

Pakistan 

Rim 

Rim 

0.001 

0.080 

0.919 

1.968 

-1.864 

Anguilla 

Gap 

Gap 

0.019 

0.981 

0.000 

-2.330 

-1.811 

Mauritania 

Gap 

Gap 

0.017 

0.979 

0.004 

-1.049 

-1.790 

Tanzania 

Gap 

Gap 

0.018 

0.977 

0.005 

-0.943 

-1.774 

Turkmenistan 

Gap 

Gap 

0.019 

0.980 

0.001 

-1.550 

-1.772 

Burma 

Gap 

Gap 

0.018 

0.955 

0.027 

-0.316 

-1.726 

Mozambique 

Gap 

Gap 

0.023 

0.973 

0.004 

-1.039 

-1.663 

Mali 

Gap 

Gap 

0.024 

0.975 

0.001 

-1.467 

-1.662 

Bangladesh 

Sao  Tome  and 

Gap 

Rim 

0.005 

0.285 

0.710 

1.310 

-1.660 

Principe 

Gap 

Gap 

0.027 

0.972 

0.001 

-1.661 

-1.618 

Kenya 

Gap 

Gap 

0.013 

0.590 

0.397 

0.807 

-1.617 

Maldives 

Gap 

Gap 

0.029 

0.970 

0.000 

-2.516 

-1.616 

Nigeria 

Gap 

Rim 

0.007 

0.339 

0.653 

1.189 

-1.602 

Paraguay 

Gap 

Rim 

0.001 

0.042 

0.957 

2.093 

-1.580 

Tonga 

Core 

Gap 

0.031 

0.968 

0.001 

-1.836 

-1.562 

Saint  Kitts  and  Nevis 

Gap 

Gap 

0.032 

0.967 

0.001 

-1.517 

-1.539 

Solomon  Islands 

Gap 

Gap 

0.038 

0.961 

0.000 

-1.995 

-1.477 

Madagascar 

Gap 

Gap 

0.040 

0.959 

0.001 

-1.635 

-1.438 

Guatemala 

Gap 

Gap 

0.036 

0.826 

0.138 

0.170 

-1.343 

Bhutan 

Gap 

Gap 

0.054 

0.945 

0.001 

-1.912 

-1.314 

Trinidad  and  Tobago 

Gap 

Gap 

0.039 

0.831 

0.130 

0.128 

-1.310 

Armenia 

Gap 

Gap 

0.053 

0.946 

0.001 

-1.665 

-1.309 

Benin 

Gap 

Gap 

0.055 

0.945 

0.001 

-1.952 

-1.309 

Azerbaijan 

Gap 

Gap 

0.055 

0.943 

0.002 

-1.490 

-1.287 

Gabon 

Gap 

Gap 

0.059 

0.875 

0.066 

-0.201 

-1.168 

Thailand 

Rim 

Rim 

0.000 

0.005 

0.995 

2.701 

-1.159 

Botswana 

Gap 

Gap 

0.054 

0.746 

0.200 

0.246 

-1.122 

Papua  New  Guinea 

Gap 

Gap 

0.066 

0.858 

0.076 

-0.169 

-1.108 

Antigua  and  Barbuda 

Gap 

Gap 

0.099 

0.899 

0.002 

-1.515 

-1.010 

India 

Rim 

Rim 

0.000 

0.001 

0.999 

3.151 

-1.004 

Saint  Vincent  and  the 
Grenadines 

Gap 

Gap 

0.100 

0.896 

0.004 

-1.338 

-0.995 

Bolivia 

Gap 

Gap 

0.083 

0.809 

0.108 

-0.077 

-0.978 

Syria 

Gap 

Rim 

0.007 

0.080 

0.912 

1.558 

-0.959 

Sri  Lanka 

Gap 

Gap 

0.088 

0.789 

0.124 

-0.032 

-0.943 

Iran 

Gap 

Rim 

0.000 

0.003 

0.997 

2.800 

-0.925 

Vietnam 

Gap 

Gap 

0.099 

0.787 

0.114 

-0.084 

-0.893 

Nicaragua 

Gap 

Gap 

0.113 

0.852 

0.035 

-0.553 

-0.889 

Senegal 

Gap 

Gap 

0.134 

0.865 

0.001 

-1.850 

-0.876 
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Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

FI 

F2 

Dominican  Republic 

Gap 

Gap 

0.061 

0.488 

0.451 

0.588 

-0.864 

Venezuela 

Gap 

Rim 

0.035 

0.270 

0.695 

0.957 

-0.840 

Aruba 

Gap 

Gap 

0.172 

0.828 

0.000 

-2.741 

-0.784 

Algeria 

Rim 

Rim 

0.005 

0.039 

0.956 

1.763 

-0.777 

Honduras 

Gap 

Gap 

0.150 

0.823 

0.027 

-0.692 

-0.754 

Gaza  Strip 

Gap 

Gap 

0.150 

0.810 

0.039 

-0.557 

-0.741 

Kyrgyzstan 

Gap 

Gap 

0.159 

0.812 

0.029 

-0.683 

-0.722 

Mongolia 

Core 

Gap 

0.162 

0.805 

0.033 

-0.633 

-0.708 

Dominica 

Gap 

Gap 

0.219 

0.781 

0.000 

-3.056 

-0.667 

Vanuatu 

Core 

Gap 

0.224 

0.776 

0.000 

-2.819 

-0.644 

Philippines 

Rim 

Rim 

0.004 

0.023 

0.972 

1.892 

-0.624 

Seychelles 

Gap 

Gap 

0.220 

0.779 

0.001 

-1.897 

-0.615 

Saudi  Arabia 

Rim 

Rim 

0.002 

0.010 

0.988 

2.191 

-0.609 

Georgia 

Gap 

Gap 

0.214 

0.781 

0.005 

-1.368 

-0.605 

Lesotho 

Gap 

Gap 

0.148 

0.599 

0.253 

0.173 

-0.585 

Kazakhstan 

Gap 

Gap 

0.187 

0.709 

0.104 

-0.224 

-0.573 

China 

Rim 

Rim 

0.007 

0.028 

0.966 

1.786 

-0.532 

Bermuda 

Gap 

Gap 

0.272 

0.727 

0.000 

-2.301 

-0.509 

Guyana 

Gap 

Gap 

0.244 

0.734 

0.022 

-0.844 

-0.499 

Kuwait 

Gap 

Gap 

0.135 

0.443 

0.422 

0.429 

-0.482 

Jamaica 

Gap 

Gap 

0.278 

0.719 

0.003 

-1.564 

-0.463 

Namibia 

Gap 

Gap 

0.225 

0.660 

0.115 

-0.209 

-0.461 

Colombia 

Gap 

Rim 

0.117 

0.359 

0.524 

0.571 

-0.446 

Libya 

Gap 

Gap 

0.162 

0.471 

0.366 

0.332 

-0.433 

Belize 

Gap 

Gap 

0.238 

0.621 

0.140 

-0.137 

-0.406 

Morocco 

Rim 

Rim 

0.032 

0.094 

0.874 

1.237 

-0.399 

Moldova 

Core 

Gap 

0.307 

0.686 

0.007 

-1.289 

-0.388 

El  Salvador 

Gap 

Gap 

0.215 

0.547 

0.238 

0.096 

-0.386 

Egypt 

Gap 

Rim 

0.020 

0.056 

0.924 

1.438 

-0.382 

Suriname 

Gap 

Gap 

0.228 

0.561 

0.211 

0.038 

-0.373 

Oman 

Gap 

Gap 

0.275 

0.649 

0.076 

-0.394 

-0.373 

Netherlands  Antilles 

Gap 

Gap 

0.356 

0.644 

0.000 

-2.945 

-0.366 

Ghana 

Gap 

Gap 

0.269 

0.623 

0.108 

-0.255 

-0.360 

Fiji 

Core 

Gap 

0.311 

0.671 

0.018 

-0.943 

-0.358 

Cuba 

Gap 

Gap 

0.314 

0.671 

0.015 

-1.010 

-0.356 

Grenada 

Gap 

Gap 

0.358 

0.642 

0.000 

-2.801 

-0.355 

Jordan 

Gap 

Gap 

0.331 

0.662 

0.007 

-1.306 

-0.339 

East  Timor 

Gap 

Gap 

0.305 

0.640 

0.055 

-0.528 

-0.328 

Bahrain 

Gap 

Gap 

0.357 

0.634 

0.009 

-1.220 

-0.285 

Albania 

Gap 

Gap 

0.356 

0.603 

0.041 

-0.655 

-0.239 

Panama 

Gap 

Gap 

0.395 

0.581 

0.024 

-0.860 

-0.187 

Saint  Lucia 

Gap 

Gap 

0.454 

0.545 

0.000 

-2.318 

-0.160 

Lebanon 

Gap 

Gap 

0.434 

0.557 

0.009 

-1.246 

-0.143 

Indonesia 

Rim 

Rim 

0.000 

0.000 

1.000 

3.726 

-0.138 

Cape  Verde 

Gap 

Gap 

0.368 

0.487 

0.145 

-0.166 

-0.111 

Malta 

Core 

Gap 

0.478 

0.521 

0.001 

-2.105 

-0.109 
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Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

FI 

F2 

Macedonia 

Gap 

Core 

0.514 

0.485 

0.001 

-2.152 

-0.047 

Belarus 

Core 

Gap 

0.450 

0.495 

0.055 

-0.563 

-0.047 

Samoa 

Core 

Core 

0.555 

0.445 

0.000 

-2.714 

0.000 

New  Caledonia 

Core 

Core 

0.528 

0.469 

0.003 

-1.639 

0.001 

Ecuador 

Gap 

Rim 

0.085 

0.093 

0.822 

1.029 

0.019 

Brazil 

Rim 

Rim 

0.006 

0.007 

0.987 

2.058 

0.075 

Tunisia 

Gap 

Core 

0.387 

0.342 

0.271 

0.112 

0.078 

United  Arab  Emirates 

Gap 

Rim 

0.037 

0.036 

0.928 

1.399 

0.089 

Peru 

Gap 

Rim 

0.055 

0.052 

0.893 

1.242 

0.090 

Mauritius 

Gap 

Core 

0.532 

0.407 

0.062 

-0.515 

0.114 

Bosnia  and 
Herzegovina 

Gap 

Core 

0.608 

0.391 

0.001 

-2.173 

0.120 

Micronesia 

Core 

Core 

0.605 

0.378 

0.016 

-1.009 

0.181 

Bahamas 

Core 

Core 

0.616 

0.371 

0.014 

-1.078 

0.195 

Qatar 

Gap 

Rim 

0.300 

0.195 

0.505 

0.483 

0.227 

Puerto  Rico 

Gap 

Core 

0.667 

0.331 

0.002 

-1.812 

0.248 

Brunei 

Gap 

Rim 

0.355 

0.206 

0.439 

0.391 

0.273 

South  Africa 

Rim 

Rim 

0.017 

0.010 

0.973 

1.783 

0.310 

Kiribati 

Core 

Core 

0.677 

0.311 

0.012 

-1.103 

0.312 

Turkey 

Rim 

Rim 

0.002 

0.001 

0.997 

2.637 

0.338 

Serbia 

Gap 

Core 

0.738 

0.260 

0.001 

-1.900 

0.393 

French  Polynesia 

Core 

Core 

0.746 

0.254 

0.001 

-2.198 

0.396 

Israel 

Gap 

Core 

0.726 

0.266 

0.009 

-1.216 

0.407 

Palau 

Gap 

Core 

0.744 

0.251 

0.005 

-1.448 

0.432 

Russia 

Rim 

Rim 

0.038 

0.017 

0.945 

1.528 

0.454 

Singapore 

Gap 

Core 

0.762 

0.221 

0.017 

-0.953 

0.520 

Bulgaria 

Gap 

Core 

0.782 

0.210 

0.008 

-1.218 

0.543 

Costa  Rica 

Gap 

Core 

0.559 

0.171 

0.270 

0.159 

0.543 

Barbados 

Gap 

Core 

0.811 

0.189 

0.000 

-2.607 

0.544 

Croatia 

Gap 

Core 

0.758 

0.212 

0.030 

-0.733 

0.545 

Mexico 

Rim 

Rim 

0.009 

0.003 

0.988 

2.125 

0.628 

Guam 

Core 

Core 

0.744 

0.167 

0.089 

-0.293 

0.660 

Iceland 

Core 

Core 

0.863 

0.133 

0.004 

-1.414 

0.774 

Uruguay 

Core 

Core 

0.825 

0.134 

0.040 

-0.563 

0.788 

Hong  Kong 

Core 

Rim 

0.323 

0.047 

0.630 

0.795 

0.895 

Argentina 

Rim 

Rim 

0.139 

0.021 

0.840 

1.200 

0.904 

Slovakia 

Core 

Core 

0.881 

0.105 

0.014 

-0.928 

0.907 

Malaysia 

Rim 

Rim 

0.100 

0.014 

0.887 

1.354 

0.942 

Ukraine 

Core 

Core 

0.629 

0.076 

0.295 

0.308 

0.955 

Cyprus 

Gap 

Core 

0.918 

0.082 

0.000 

-2.352 

0.977 

Chile 

Rim 

Core 

0.540 

0.057 

0.402 

0.499 

1.021 

Marshall  Islands 

Core 

Core 

0.922 

0.077 

0.001 

-1.884 

1.022 

Macau 

Core 

Core 

0.895 

0.068 

0.037 

-0.491 

1.127 

San  Marino 

Core 

Core 

0.941 

0.059 

0.000 

-2.115 

1.141 

Cayman  Islands 

Gap 

Core 

0.943 

0.057 

0.000 

-2.434 

1.145 

Romania 

Gap 

Core 

0.562 

0.042 

0.396 

0.539 

1.177 

Latvia 

Core 

Core 

0.926 

0.060 

0.014 

-0.835 

1.182 
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Observation 

Barnett 

Model 

Pr(Core) 

Pr(Gap) 

Pr(Rim) 

FI 

F2 

New  Zealand 

Core 

Core 

0.906 

0.056 

0.038 

-0.456 

1.215 

Luxembourg 

Core 

Core 

0.937 

0.054 

0.009 

-0.997 

1.222 

Liechtenstein 

Core 

Core 

0.955 

0.044 

0.001 

-1.611 

1.295 

Slovenia 

Core 

Core 

0.956 

0.040 

0.003 

-1.280 

1.348 

Andorra 

Gap 

Core 

0.961 

0.039 

0.001 

-1.770 

1.349 

Norway 

Core 

Core 

0.954 

0.036 

0.010 

-0.883 

1.414 

Estonia 

Core 

Core 

0.965 

0.032 

0.003 

-1.251 

1.453 

Greece 

Rim 

Core 

0.916 

0.029 

0.055 

-0.212 

1.514 

Australia 

Core 

Core 

0.735 

0.024 

0.241 

0.405 

1.536 

Czech  Republic 

Core 

Core 

0.962 

0.025 

0.013 

-0.713 

1.585 

Hungary 

Core 

Core 

0.941 

0.025 

0.034 

-0.361 

1.589 

Portugal 

Core 

Core 

0.932 

0.025 

0.043 

-0.273 

1.595 

Lithuania 

Core 

Core 

0.972 

0.022 

0.006 

-0.954 

1.634 

Korea,  South 

Rim 

Rim 

0.159 

0.004 

0.837 

1.455 

1.688 

Ireland 

Core 

Core 

0.977 

0.017 

0.006 

-0.905 

1.752 

Poland 

Core 

Core 

0.841 

0.015 

0.143 

0.267 

1.784 

Denmark 

Core 

Core 

0.985 

0.014 

0.000 

-1.902 

1.788 

Finland 

Core 

Core 

0.986 

0.013 

0.002 

-1.379 

1.859 

Switzerland 

Core 

Core 

0.988 

0.011 

0.001 

-1.605 

1.903 

Austria 

Core 

Core 

0.985 

0.012 

0.003 

-1.139 

1.908 

United  Kingdom 

Core 

Core 

0.971 

0.011 

0.018 

-0.468 

1.943 

Canada 

Core 

Core 

0.901 

0.010 

0.089 

0.149 

1.983 

Spain 

Core 

Core 

0.890 

0.010 

0.100 

0.198 

1.989 

Sweden 

Core 

Core 

0.991 

0.009 

0.000 

-1.757 

2.021 

Italy 

Core 

Core 

0.982 

0.009 

0.008 

-0.704 

2.029 

Netherlands 

Core 

Core 

0.991 

0.008 

0.001 

-1.417 

2.058 

France 

Core 

Core 

0.972 

0.008 

0.020 

-0.353 

2.128 

Belgium 

Core 

Core 

0.993 

0.006 

0.001 

-1.432 

2.221 

Japan 

Core 

Core 

0.951 

0.005 

0.044 

0.011 

2.324 

Germany 

Core 

Core 

0.994 

0.003 

0.003 

-0.973 

2.455 

United  States 

Core 

Core 

0.972 

0.003 

0.025 

-0.108 

2.558 
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Appendix  E:  Failed  States  Index  Comparison 

The  Fund  for  Peace  (FFP)  publishes  an  annual  Failed  States  Index  which  provides 
scores  for  each  country  indicating  their  current  stability.  This  appendix  provides  a 
comparison  between  the  2006  Index  and  the  scores  each  nation  received  on  the  Big 
Picture  Function  as  defined  in  this  study.  The  two  lists  were  compared  using  Spearman’s 
Rank  Correlation  Coefficient  to  test  whether  or  not  there  are  statistically  significant 
differences  between  the  ranks  assigned  to  each  country  using  the  two  methods.  Note  that 
the  FFP  assessed  146  of  the  200  nations  analyzed  in  this  study,  and  this  table  includes 
only  those  countries. 

Spearman’s  Rank  Correlation  coefficient  is  calculated  using  the  differences 
between  rankings  for  the  two  methods,  d. 


n(nr  —  1) 


With  n  equal  to  146,  we  obtain  a  value  of  0.849,  meaning  there  is  an 
approximately  85%  correlation  between  the  FFP  and  Function  2  country  rankings.  To 
see  if  this  correlation  is  statistically  different  from  zero,  we  calculate  the  t-statistic 

*-  .  p 

\/(l-P2)/(n-2) 


for  which  we  obtain  a  value  of  19.28,  which  is  significant  at  the  99%  confidence  level.  In 
other  words,  there  appears  to  be  a  significant  correlation  between  the  FFP  and  Function  2 
rankings,  further  indicating  that  our  discriminant  function  may  be  useful  for  classifying 
failing  states. 
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Provided  in  the  table  are  the  FFP  and  Big  Picture  scores  and  rankings,  and  the 


differences  between  the  two. 


Scores  Rankings  Differences 


Observation 

FFP 

Function  2 

FFP 

Function  2 

d 

d" 

Sudan 

112.3 

-2.314 

1 

27 

-26 

676 

Congo,  DRC 

110.1 

-3.147 

2 

2 

0 

0 

Cote  d'Ivoire 

109.2 

-2.163 

3 

31 

-28 

784 

Iraq 

109.0 

-2.264 

4 

28 

-24 

576 

Zimbabwe 

108.9 

-2.860 

5 

9 

-4 

16 

Chad 

105.9 

-2.767 

6 

11 

-5 

25 

Somalia 

105.9 

-2.882 

7 

8 

-1 

1 

Haiti 

104.6 

-3.017 

8 

4 

4 

16 

Pakistan 

103.1 

-1.864 

9 

36 

-27 

729 

Afghanistan 

99.8 

-2.560 

10 

15 

-5 

25 

Guinea 

99.0 

-2.393 

11 

24 

-13 

169 

Liberia 

99.0 

-2.396 

12 

23 

-11 

121 

Central  African  Republic 

97.5 

-2.565 

13 

14 

-1 

1 

Korea,  North 

97.3 

-2.529 

14 

17 

-3 

9 

Burundi 

96.7 

-3.356 

15 

1 

14 

196 

Sierra  Leone 

96.6 

-3.120 

16 

3 

13 

169 

Yemen 

96.6 
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applying  Multivariate  Analysis  techniques  to  identify  the  states  most  likely  to  provide  asylum  for  terrorists.  Weak  and  Failed  States  are  attractive  to 
terrorist  groups  looking  for  safe  haven  and  recruits.  Govermnents  in  these  states  are  often  unable  to  prevent  illegal  activity,  and  are  vulnerable  to 
corruption  or  takeover.  Citizens  of  failing  states  often  experience  poverty,  disease,  and  unemployment,  and  may  see  little  hope  for  improvement. 
Terrorists  can  meet  these  disenfranchised  people’s  basic  needs  and  promise  brighter  futures  for  families  of  those  willing  to  fight  and  perhaps  die  for 
the  cause.  Current  published  efforts  to  identify  failing  states  primarily  use  Ordinary  Least  Squares  Regression,  which  requires  the  analyst  to 
predefine  the  degree  to  which  a  state  is  likely  to  fail.  This  thesis  uses  a  Factor  Analysis  approach  to  identify  the  key  indicators  of  state  failure,  and 
Discriminant  Analysis  to  classify  states  as  Stable,  Borderline,  or  Failing  based  on  these  indicators.  Furthermore,  each  nation’s  discriminant  function 
scores  are  used  to  determine  their  degree  of  instability.  The  methodology  is  applied  to  200  countries  for  which  open  source  data  was  available  on 
167  variables.  Results  of  the  classification  are  compared  with  sub  ject  matter  experts  in  the  field  of  state  failure. _ 
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